ArchiveTeam wants a copy of all our sites

You can talk about almost anything that you want to on this board.

Moderator: Moderators

User avatar
koitsu
Posts: 4218
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

ArchiveTeam wants a copy of all our sites

Post by koitsu » Tue Apr 24, 2012 5:20 pm

I have just been contacted by a member of a group called "ArchiveTeam" who asked for, verbatim quote:

"we want a full archive of everything on the parodius network"

My response was as follows:
Plain and simple:

Not going to happen, and I'm not going to give approval for this.

Each and every hosted person here has a right to define whether or not they want their content archived. Some may have robots.txt in place, others may not but might have other conditionals (some technical, some via footers/agreements on their page). Their data is their data; I am not the owner of their data. I cannot decide for them if they are comfortable with that.

In fact, given our highly moral and ethical values, I'm a little surprised you'd even ask for this. I hope I'm misunderstanding your request, otherwise I'm actually a bit offended by it.
I state this here, publicly, given that many of our hosted users are members here on the forum.

If you feel comfortable with someone archiving your data like this, who is a third-party, I would recommend you contact them and offer/work out something. Otherwise, hosted users' data, as I said, is their own data. I will never, ever agree to such requests on behalf of people we host. Like I said: your data is your data, and the decision is not mine to make.

3gengames
Formerly 65024U
Posts: 2281
Joined: Sat Mar 27, 2010 12:57 pm

Post by 3gengames » Tue Apr 24, 2012 5:46 pm

I think all the tech docs here on the site also should be saved, but yeah my PM's in stuff, They won't be useful to people honestly, but I don't like the idea of having someone else have them either as some stuff is supposed to be kept a little closer to the vest.

User avatar
Kit Sniper
Posts: 15
Joined: Sun Nov 02, 2008 11:16 pm
Location: Mexico
Contact:

Post by Kit Sniper » Tue Apr 24, 2012 7:54 pm

I was expecting them to say something.

They basically take site archives and distribute them via torrent / Archive.org / other sites. They do good stuff but I've always been iffy about the legality of the thing. And the privacy part.

At least in my regard the answer is a resounding no. My site is my site and it's not going to go down until I die.

Edit: Oh goodie.

http://www.archiveteam.org/index.php?ti ... Networking

This does not look good.

User avatar
koitsu
Posts: 4218
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu » Tue Apr 24, 2012 8:04 pm

Kit Sniper wrote:I was expecting them to say something.

They basically take site archives and distribute them via torrent / Archive.org / other sites. They do good stuff but I've always been iffy about the legality of the thing. And the privacy part.

At least in my regard the answer is a resounding no. My site is my site and it's not going to go down until I die.

Edit: Oh goodie.

http://www.archiveteam.org/index.php?ti ... Networking

This does not look good.
Well, I'm fine with them archiving the home page, our FAQ, etc. -- sure, that's all public anyway. But our home page/etc. != users' content. They will need to get every individual site owners' permission to archive stuff. And I will be very, very pissed if they run some kind of scraping bot against everything without talking to me first. Bandwidth doesn't grow on trees.

I'm not sure they should bother at this point anyway -- if they were fair, they'd simply wait until shortly before October to talk to me. Most of the site owners will be moving their stuff to other URLs, which means all the content/etc. will be available on the Internet just at a new URL. Thus I cease to see the point in archiving it. For things that don't get moved, that's something that can be discussed later.

User avatar
Kit Sniper
Posts: 15
Joined: Sun Nov 02, 2008 11:16 pm
Location: Mexico
Contact:

Post by Kit Sniper » Tue Apr 24, 2012 8:08 pm

koitsu wrote:Well, I'm fine with them archiving the home page, our FAQ, etc. -- sure, that's all public anyway. But our home page/etc. != users' content. They will need to get every individual site owners' permission to archive stuff.

I'm not sure they should bother at this point anyway -- if they were fair, they'd simply wait until shortly before October to talk to me. Most of the site owners will be moving their stuff to other URLs, which means all the content/etc. will be available on the Internet just at a new URL. Thus I cease to see the point in archiving it. For things that don't get moved, that's something that can be discussed later.
I've followed them for a little while now and they'll be making copies of everything hosted at Parodius, not just the index, with or without permission. Even sites that won't go away, like mine.

They won't be able to get many of the files kept under directories with an index, but I still don't want them to archive my site. I'm not going away anytime soon.

Edit:
3gengames - when they mean archive they basically want mirrors of sites. For example, they'd back up the forum posts, but not the private data like PMs.

User avatar
koitsu
Posts: 4218
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu » Tue Apr 24, 2012 8:18 pm

Well, if what you say is true, then I hope my previous statement to them in PM is a sufficient deterrent. I said no. If they do it anyway and it violates an individual's sites terms, then its up to the individual to contact them + deal with it in some way. But overall I've given them my statement: do not do this. Bandwidth = not free. The last thing I need is to find my 95th percentile at 20mbit because someone thinks bandwidth grows on trees.

LocalH
Posts: 173
Joined: Thu Mar 02, 2006 12:30 pm

Post by LocalH » Tue Apr 24, 2012 8:48 pm

Perhaps it may be worth investigating a proactive block, if you can identify any IP ranges that they use to scrape such content? Figure out which ranges of addresses need blocking and then block them server wide, so that they can't eat up valuable bandwidth. Just an idea, and if it's possible to do then each client can still voluntarily provide their site content to ArchiveTeam if they so choose.

User avatar
Kit Sniper
Posts: 15
Joined: Sun Nov 02, 2008 11:16 pm
Location: Mexico
Contact:

Post by Kit Sniper » Tue Apr 24, 2012 8:55 pm

LocalH wrote:Perhaps it may be worth investigating a proactive block, if you can identify any IP ranges that they use to scrape such content? Figure out which ranges of addresses need blocking and then block them server wide, so that they can't eat up valuable bandwidth. Just an idea, and if it's possible to do then each client can still voluntarily provide their site content to ArchiveTeam if they so choose.
That won't work.

They basically run wget scripts on sites from various locations across the world by coordinating volunteers via their IRC channel. So even if you block one address range from Topeka, someone from France might go at it.

The good thing is, they don't really put ten people to download the same site at the same time. They get people to do segments and once they're done, they're done. There are no redundant scrapes. So while they may be downloading everything... they won't do it repeatedly. :\

Tormenter
Posts: 303
Joined: Sat Jun 03, 2006 9:17 pm

Post by Tormenter » Wed Apr 25, 2012 1:29 pm

Whats the big deal about having an archive of all of this information, instead of letting it go offline to never be seen or used again? IMO, thats pretty much a kick in the ass to everyone in the community.

User avatar
koitsu
Posts: 4218
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu » Wed Apr 25, 2012 1:58 pm

Tormenter wrote:Whats the big deal about having an archive of all of this information, instead of letting it go offline to never be seen or used again? IMO, thats pretty much a kick in the ass to everyone in the community.
1. What makes you think the information is being "let go offline never to be seen or used again?" You, nor these ArchiveTeam folks, have any insight to that. You don't know what our hosted users are doing, and neither do they.

2. The problem -- from my perspective, and that's why I made the post sticky -- is that the ArchiveTeam folks asked me for permission to download all of our hosted sites. **I** am not the person to ask when it comes to other people's data. If they want to archive (for example) Kitsune's sites then they need to talk to him, not me. If they want to archive NESWorld, then they need to talk to Martin. If they can't figure out who to ask (i.e. owner doesn't disclose contact information), then asking me won't solve that either.

The point is: I don't own our hosted users' data. They own their data. Decisions like this need to be made by the hosted users on a per-user basis and not by me. Nothing gives me the right to make decisions for them.

3. From a technical level, the "big deal" has to do with network traffic. I think this is the 2nd or 3rd time I've brought up this point in recent threads where you've commented. I will repeat, and make bold: bandwidth/network traffic is expensive. It is not free. You may want to read up on what 95th-percentile billing is about -- because it's what datacenters/co-location providers use. It may not be something you've seen before because most low-end "hosting" environments look at things from a volumetric point of view, but no datacenter does (or carrier/transport provider, for that matter). 95th-percentile can screw a person out of tens of thousands of dollars in bandwidth overage fees.

Is there anything constructive you can add to any of the threads you've posted in? Sorry for getting combative, but all I've seen is peanut-gallery comments passing judgement and asking "why" in a smarmy way. What do you have that's positive that you can bring to the table? Because I welcome such.

tepples
Posts: 22086
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Wed Apr 25, 2012 2:02 pm

Perhaps I should get with them and tell them what's already being done to archive the nesdev subdomain and wiki.nesdev.com domain.

EDIT: I summarized koitsu's posts in this topic into AT's page about Parodius.

User avatar
koitsu
Posts: 4218
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Post by koitsu » Wed Apr 25, 2012 9:07 pm

tepples wrote:Perhaps I should get with them and tell them what's already being done to archive the nesdev subdomain and wiki.nesdev.com domain.

EDIT: I summarized koitsu's posts in this topic into AT's page about Parodius.
Thanks much man. I appreciate the effort; right now I have too much going on (with all of this stuff -- you should see my inbox -- and with doctor's visits, work chaos (probably the most chaos I've ever seen), etc... I get to have an endoscopy tomorrow, for example. Hooray...)

User avatar
Kit Sniper
Posts: 15
Joined: Sun Nov 02, 2008 11:16 pm
Location: Mexico
Contact:

Post by Kit Sniper » Wed Apr 25, 2012 9:59 pm

tepples wrote:Perhaps I should get with them and tell them what's already being done to archive the nesdev subdomain and wiki.nesdev.com domain.

EDIT: I summarized koitsu's posts in this topic into AT's page about Parodius.
Um... it's foxhack.net, not .com :P Could you please fix that?

I own the .net / com / org domains but only use the .net one. Thanks for letting them know about it, and I'm writing a post about that at my site too.

Tormenter
Posts: 303
Joined: Sat Jun 03, 2006 9:17 pm

Post by Tormenter » Thu Apr 26, 2012 9:20 am

koitsu wrote:
Tormenter wrote:Whats the big deal about having an archive of all of this information, instead of letting it go offline to never be seen or used again? IMO, thats pretty much a kick in the ass to everyone in the community.
1. What makes you think the information is being "let go offline never to be seen or used again?" You, nor these ArchiveTeam folks, have any insight to that. You don't know what our hosted users are doing, and neither do they.

2. The problem -- from my perspective, and that's why I made the post sticky -- is that the ArchiveTeam folks asked me for permission to download all of our hosted sites. **I** am not the person to ask when it comes to other people's data. If they want to archive (for example) Kitsune's sites then they need to talk to him, not me. If they want to archive NESWorld, then they need to talk to Martin. If they can't figure out who to ask (i.e. owner doesn't disclose contact information), then asking me won't solve that either.

The point is: I don't own our hosted users' data. They own their data. Decisions like this need to be made by the hosted users on a per-user basis and not by me. Nothing gives me the right to make decisions for them.

3. From a technical level, the "big deal" has to do with network traffic. I think this is the 2nd or 3rd time I've brought up this point in recent threads where you've commented. I will repeat, and make bold: bandwidth/network traffic is expensive. It is not free. You may want to read up on what 95th-percentile billing is about -- because it's what datacenters/co-location providers use. It may not be something you've seen before because most low-end "hosting" environments look at things from a volumetric point of view, but no datacenter does (or carrier/transport provider, for that matter). 95th-percentile can screw a person out of tens of thousands of dollars in bandwidth overage fees.

Is there anything constructive you can add to any of the threads you've posted in? Sorry for getting combative, but all I've seen is peanut-gallery comments passing judgement and asking "why" in a smarmy way. What do you have that's positive that you can bring to the table? Because I welcome such.
I host many sites, I know how much traffic costs. This is just a message board, and could easily get buy on a $100 a year plan with no problems.

tepples
Posts: 22086
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Thu Apr 26, 2012 10:22 am

Yeah, perhaps part of the difference is that one of you is talking about "*.parodius.com" and the other about "nesdev.com and wiki.nesdev.com".

Post Reply