Reddit if full of bots: thread reposted exactly the same, comment by comment, 10 months later

Blaze@lemmy.blahaj.zone · 2 months ago

Reddit if full of bots: thread reposted exactly the same, comment by comment, 10 months later

Anti-Face Weapon@lemmy.world · 2 months ago

My understanding of how this works is that that left one is real accounts making real comments, at least in the majority.

Then when the link gets reposted, either by a bot or naturally, potentially depending on the title, the bots scrape the old comments and post them.

It’s content farming. And Reddit is probably okay with this.

moriquende@lemmy.world · 2 months ago

The right one is the “real” accounts. Notice how the left one is newer and all the accounts have names ending with four digits, except where they aren’t copies from the right.

Sternout@feddit.de · 2 months ago

No, the left one is older and most the names in the right contain four numbers.

What’s going on here?

Maybe op updated the picture?

Blaze@reddthat.com · 2 months ago

I did, because other people complained in another comment that it was confusing to not have the older thread on the left.

Anyway, it’s pretty obvious which one is which one

Sternout@feddit.de · 2 months ago

Thanks I almost thought I’m delusional

FiniteBanjo@lemmy.today · 2 months ago

I also thought you were, lmao.

Fuck_u_spez_@lemmy.world · 2 months ago

deleted by creator

SuddenDownpour@sh.itjust.works · 2 months ago

The list of names at the left creeps me the fuck out.

EldritchFeminity@lemmy.blahaj.zone · 2 months ago

I saw this exact same style of bot account years ago on Tumblr. They always follow the same naming scheme: one word or two words combined and then a string of 4 digits. I bet if you go to any of their profiles, you’ll find like 4 comments that are all copied from old threads and a bunch of upvotes on completely random subs, possibly even all of them being on other bot accounts’ posts and comments.

The real question is whether they’re being used to fake activity on Reddit, sway public opinion by posting this sort of political slant, or will they later be used to advertise scams and this is just to make them seem legitimate.

sep@lemmy.world · 2 months ago

Why not all of the above? If you have a service, you want to sell it to as many customers as possible.

livus@kbin.social · 2 months ago

Reddit is going to poison LLMs sooner than I thought.

postmateDumbass@lemmy.world · 2 months ago

LMAO while AIs reading training data sets get stuck in infinite loops.

bjorney@lemmy.ca · 2 months ago

Reddit probably omits bot accounts when it sells its data to AI companies

phdepressed@sh.itjust.works · 2 months ago

I doubt Reddit is in charge of many of the existing bots on their site.

bjorney@lemmy.ca · 2 months ago

Reddit has access to its own data - they absolutely know which users are posting unique content and which user’s content is a 100% copy of data that exists elsewhere on their own platform

phdepressed@sh.itjust.works · 2 months ago

I know they could be I’m just not sure they’re that competent. These bots often aren’t single user or just copy paste either, there’s usually some effort to mix it up or change wording slightly. Reddits internal search function is infamously shit but they “know” which users are unlabeled bots with some effort put behind them?

bjorney@lemmy.ca · 2 months ago

I know everyone here likes to circle jerk over “le Reddit so incompetent” but at the end of the day they are a (multi) billion dollar company and it’s willfully ignorant to infer that there isn’t a single engineer at the company who knows how to measure string similarity between two comment trees (hint: import difflib in python)

icydefiance@lemm.ee · edit-2 2 months ago

To compare every comment on reddit to every other comment in reddit’s entire history would require an index, and if you want to find similar comments instead of exact matches, it becomes a lot harder to do that efficiently. ElasticSearch might be able to do it, but then you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much when people are leaving new comments, and that would probably be expensive.
Comparing combinations of comments is probably impossible. Reddit has a massive number of comments to begin with, and the number of possible subtrees of those comments would just be absurd. If you only care about comparing entire threads and not subtrees, then this doesn’t apply, but I don’t know how useful that will be.
Programmers just do what they’re told. If the managers don’t care about something, the programmers won’t work on it.

livus@kbin.social · 2 months ago

Doubt it, they are interwoven into almost any conversation with more than 70 comments.

bjorney@lemmy.ca · 2 months ago

If you have access to the entire Reddit comment corpus it’s trivial to see which users are only reposting carbon copies of content that appears elsewhere on the site

livus@kbin.social · 2 months ago

The low level bots in OPs screenshot, sure, because it’s identical. Not the rest.

I used to hunt bots on reddit for a hobby and give the results to Bot Defense.

Some of them use rewrites of comments with key words or phrases changed to other words or phrases from a thesaurus to avoid detection. Some of them combine elements from 2 comments to avoid detection. Some of them post generic comments like 💯. Doubtless there are some using AI rewrites of comments now.

My thought process is if generic bots have been allowed to go so rampant they fill entire threads that’s an indication of how bad the more sophisticated bot problem has become.

And I think @phdepressed is right, no one at reddit is going to hunt these sophisticated bots because they inflate numbers. Part of killing the API use was to kill bot detection after all.

bjorney@lemmy.ca · edit-2 2 months ago

Reddit has way more data than you would have been exposed to via the API though - they can look at things like user ARN (is it coming from a datacenter), whether they were using a VPN, they track things like scroll position, cursor movements, read time before posting a comment, how long it takes to type that comment, etc.

no one at reddit is going to hunt these sophisticated bots because they inflate numbers

You are conflating “don’t care about bots” with “don’t care about showing bot generated content to users”. If the latter increases activity and engagement there is no reason to put a stop to it, however, when it comes to building predictive models, A/B testing, and other internal decisions they have a vested financial interest in making sure they are focusing on organic users - how humans interact with humans and/or bots is meaningful data, how bots interact with other bots is not

criitz@reddthat.com · 2 months ago

It’s probably not as easy as you imagine for reddit to identify and cleanse all bot content.

livus@kbin.social · 2 months ago

Of course it’s not. Nor do they want to.

I think the person you’re talking to thinks all bots are like the easy ones in this screenshot.

bjorney@lemmy.ca · edit-2 2 months ago

Look at the picture above - this is trivially easy. We are talking about identifying repost bots, not seeing if users pass/fail the Turing test

If 99% of a user’s posts can be found elsewhere, word for word, with the same parent comment, you are looking at a repost bot

criitz@reddthat.com · 2 months ago

That’s easy in an isolated case like this, but the reality of the entire reddit comment base is much more complex.

Damage@feddit.it · 2 months ago

It’s account farming. They make fake accounts look legitimate so they can use them to influence opinions on the site.

livus@kbin.social · 2 months ago

They also use them in groups of 3 to lure people to malicious sites and scam sites. Especially fake merchandise sites.

kubica@kbin.social · 2 months ago

Basically replaying a thread to make it look like there’s activity in the sub.

runswithjedi@lemmy.world · 2 months ago

deleted by creator

Anti-Face Weapon@lemmy.world · 2 months ago

The left predates the right by 10 months

runswithjedi@lemmy.world · 2 months ago

deleted by creator

Pacrat173@lemmy.ml · 2 months ago

https://en.wikipedia.org/wiki/Dead_Internet_theory

I didn’t believe this when I first heard about it but it’s looking more true everyday

DahGangalang@infosec.pub · 2 months ago

Yeah, even if we’re not quite “there” yet, it feels like we’re at least moving in that direction

FiniteBanjo@lemmy.today · 2 months ago

Definitely depends on where you’re going. Certain Hexbear posts are such obvious bot networks, while some niche communities can remember what they wrote more than two comments ago.

fine_sandy_bottom@discuss.tchncs.de · 2 months ago

This gets posted all the time, and it’s frustrating that it lacks any nuance.

It’s just a spooky bedtime story… “imagine if everyone you talk to online is just a bot”

Yes a lot of online content is generated.

Yes it’s getting worse.

Yes there’s lots of bots.

However… you can choose where you spend your time online, and spend it with friends or likeminded people.

What I mean to say is, some communities on reddit are “mostly dead”, but you don’t have to go there.

RememberTheApollo_@lemmy.world · 2 months ago

Just paid a visit. It’s really gotten bad. Horrible titles that make little sense. People falling over each other to make tired quips instead of conversation, and the rest to point out how someone is wrong or one-up the commenter.

jkrtn@lemmy.ml · 2 months ago

That’s what it has been like for years now.

RememberTheApollo_@lemmy.world · 2 months ago

IMO it’s gotten markedly worse since the 3rd party app debacle. Perhaps combined with the advent of AI added to bots has made it obvious. Yeah, it’s been on a decline for quite a bit with the repost bots repeating everything from posts to replies, but people would call them out. Now it’s like it’s bots all the way down or the remaining participants have resigned themselves to the decline.

Small subs still seem mostly safe, but anything with decent participation is pretty bad.

CafecitoHippo@lemm.ee · 2 months ago

Yeah the only real reason for Reddit for me anymore is sports discourse. E.g. the Baltimore Orioles are my MLB team. /r/Orioles on reddit has almost 80k members. Currently on the page there’s 62 people actively in the sub and that’s at 10am on a Wednesday, not during a game. The two Orioles communities on lemmy are Orioles@fanaticus.social and Baltimore Orioles@lemmy.world and they have 133 and 131 subscribers, respectively. There’s a bot posting game day threads and 0 comments in all of them. The only post not by a game day bot was 21 days ago.

imaqtpie@sh.itjust.works · edit-2 2 months ago

Yeah I feel you, at least the Orioles team is super stacked rn though (speaking as a Yankees fan 🫠). !yankees@fanaticus.social is equally dead.

My current thought process is that if we can get a decently active generalized baseball community going, it could provide a stepping stone to increasing the activity in the team-specific communities. I’m trying to be active on !mlb@lemmy.ml and !baseball@fanaticus.social as much as possible.

There is already a latent population of sports fans on Lemmy, but it’s sort of a self-fulfilling prophecy that the communities aren’t active so people assume there must be no other fans.

My other thought on this topic is that although I do miss the active fan discussion and game threads, the subreddits for essentially all of my teams were indisputably toxic cesspools. The whining, armchair GMing, scapegoating, and just completely idiotic takes were out of this world. So it’d be nice to have activity, but too much activity can also degrade the quality of discussion to the level of Twitter and just create a very toxic environment where fans are constantly arguing and complaining.

Blaze@reddthat.com · 2 months ago

Username checks out. Which client are you using for Lemmy?

RememberTheApollo_@lemmy.world · 2 months ago

I switch between Mlem and Voyager (iOS). I like them both, but I tend to use Voyager more. Mlem tends to give me more variety of communities, I like Voyager’s layout.

Rivers@lemmy.world · 2 months ago

Reddit went to shit when the zoomers flooded in, arguably the late 90’s kids aswell

orangeboats@lemmy.world · edit-2 2 months ago

I’ve noticed that many Reddit users with the username format Word_Word_Number (for example Absolute_Bot_1230) are almost guaranteed to either be a bot or extremely inflammatory – it’s like everything they post is meant to generate controversies.

meowMix2525@lemm.ee · edit-2 2 months ago

Yeah reddit has a name generator that you can choose from when you create an account and that’s the format it uses. Those names are almost exclusively bots and throwaway/anon accounts

Dasus@lemmy.world · 2 months ago

It’s Reddit’s automatic username generation, so either yeah, bots, or someone logging in through Google/Facebook and having a username assigned to them.

Syd@lemm.ee · 2 months ago

Well yeah they even have bot in their username.

wazzupdog@lemmynsfw.com · edit-2 2 months ago

I’m glad i end with word*_word_word for my screen name, lol.

xyz@lemmus.org · 2 months ago

I don’t get it. They already created a good bot network, but the username part is where they get lazy.

PDFuego@lemmy.world · 2 months ago

That’s been happening for ages. I’m sure if you check the profiles you’ll find other posts with all the same bots commenting. A lot of lazier ones wait exactly a year to repost, and it’s pretty obvious in subs for something like a live service game where they’ll be reposting complaints that are way out of date. One in the Monster Hunter sub reposted a trailer for Iceborne which had been out for 3 years by that point.

Buffalox@lemmy.world · 2 months ago

These are probably the bots that will be paid for creating content too. lol

WalrusDragonOnABike [they/them]@reddthat.com · 2 months ago

My favorite reposts were the ones that were only like 6 months later, so they’re talking about christmas or r/place as if its that time of year when its the total opposite.

TigrisMorte@kbin.social · 2 months ago

They lost so many users they needed the “engagement” numbers for the IPO so they opened the flood gate. Now they are stuck with an issue they can’t fix without admitting the fraud.

octopus_ink@lemmy.ml · edit-2 2 months ago

How far does it have to go before investors start to care I wonder? I somehow doubt OP is the only person capable of perceiving and documenting this.

TigrisMorte@kbin.social · 2 months ago

Where as it is shifting to a front for Gov. Psy Ops just like Xitter, investors don’t matter.

egeres@lemmy.world · 2 months ago

Lemmy is not immune to this!! We need to develop FOSS to mitigate/detect that

KillingTimeItself@lemmy.dbzer0.com · 2 months ago

oh it’s simple, don’t capitalize and it’s immediately harder to do.

Chozo@fedia.io · 2 months ago

I do find it funny that you didn’t capitalize any words in this comment.

KillingTimeItself@lemmy.dbzer0.com · 2 months ago

i mean listen we’ve got priorities here. We’re capitalizing, not capitalizing.

Bonehead@kbin.social · 2 months ago

Give them some credit. They’ve finally changed the user name generator to random words instead of Adjective_Noun_####.

De_Narm@lemmy.world · 2 months ago

They have not, left is the more recent post. The right one could be real and is just recreated by these bots.

Grandwolf319@sh.itjust.works · edit-2 2 months ago

I agree, credit retracted.

Zekas@lemmy.world · 2 months ago

No, I think those comments are just unwitting humans walking into the simulation.

someguy3@lemmy.ca · 2 months ago

“It doesn’t look like anything to me.”

AlteredStateBlob@kbin.social · 2 months ago

Adjective_Noun_#### are default generated by reddit, so they upgraded to their own generator at least it seems.

gandalf_der_12te@discuss.tchncs.de · 2 months ago

IMO the only way to not be infected by bot content is to not be popular, or small enough to be irrelevant.

mPony@lemmy.world · 2 months ago

Popularity is overrated. Irrelevance is freedom.

flango@lemmy.eco.br · 2 months ago

Exactly!

nichtsowichtig@feddit.de · 2 months ago

I wonder what the fediverse’s answer will be to this problem once it gets popular. Will instances that has a lot of bot content be defederated? some kind of fedipact against bot (unlabled) content?

greencactus@lemmy.world · 2 months ago

Thank you. That is the day when I’ll finally stop using Reddit. I never have thought that bots write that realistically, so thank you for proving it.

meowMix2525@lemm.ee · 2 months ago

Well they actually don’t write that realistically, these are copy and paste bots that are just trying to farm karma so they can later sell the account (which I’ve heard is a thing apparently?). You can see the left is all original accounts by the uniqueness of their usernames and the copied posts on the right are all reddit generated names.

derpgon@programming.dev · 2 months ago

The left image is the original post, 10 months old, where (at least most) of the users are real people. Left is full of bots copying the post 1:1, comments included.

fne8w2ah@lemmy.world · 2 months ago

Or worse still, AI (really LLM) farming.

Immersive_Matthew@sh.itjust.works · 2 months ago

Would be even hard to detect now that AI can write the same message in different ways. I question every comment I read, especially the ones appealing to one’s emotions.

Oneobi@lemmy.world · 2 months ago

Hang on a sec, how do we know you’re not a bot lol

You raise a valid point. Hive mind and weaponising narrative is a danger to us all.

Fubarberry@sopuli.xyz · edit-2 2 months ago

As an AI language model, it would be highly irresponsible for me to impersonate users on a website. This action violates privacy rights by potentially accessing and misusing personal information. Impersonation involves deception, undermining trust in both the AI and the platform where it operates. Furthermore, it can have legal implications, such as violating terms of service agreements or privacy laws. Ultimately, engaging in impersonation could lead to negative publicity and damage the reputation of the AI and the platform it serves.

/s

Immersive_Matthew@sh.itjust.works · 2 months ago

I get the sarcasm, but this is written as if there is one AI and the reality of who knows how many individually run instances all under whatever rules their implementers choose.

DanTDM@lemmy.world · 2 months ago

Thank god this isn’t a problem here

P░ U░S░S░ Y░I░ N░B░I░O ░

PrettyFlyForAFatGuy@feddit.uk · 2 months ago

hey…

There’s no pussy in your bio…

😡

P03 Locke@lemmy.dbzer0.com · 2 months ago

Just keep clicking. You’ll get to the malware eventually.

cumskin_genocide@lemm.ee · 2 months ago

A strange thing on reddit is that if you make a new account and then make a comment that gets like 8 down votes then that new account gets shadow banned.

They’ve implemented so many rules that it encourages new users to act in the same way as the hive mind. Where even if you are an actual user then you are indistinguishable from a bot. Basically you’ve become a living NPC.

blind3rdeye@lemm.ee · edit-2 2 months ago

Yeah, I’ve seen that a bunch of times. Some subredits seem to be a particularly popular places to karma-farm to make convincing sock-puppet accounts to sell. Often someone in the thread points out that it is a bot repost - but the fake post and fake comments are easier to engage with compared to the accusation that someone is a karma-farming bot.

(And of course, these bots-in-training will upvote each other’s comments and posts… so it always looks pretty popular.)

solstice@lemmy.world · 2 months ago