Personal Data Import/Export

The other small concern of mine is facilitating full data export: that is to say, if a user is on a big pod like JoinDiaspora.com, and has two years worth of posts, photos, and a couple hundred contacts, exporting all of that could be considered database-intensive, especially if lots of people try to do a full export on the same day.

Would setting a variable that defines how much time a user must wait to perform the download after requesting all of their data be a good idea, so that the database doesn’t get too strained if everyone tried to download all of their data all at the same time?

Facebook does it nicely by putting it in a queue and then emailing the user later when the archive is ready.

@jonneha about authentication on the new pod, one more time, Persona can be the solution there, because the user will be able to log in on every website, so we can simply ask the user to log in. We really need to use Persona for authentication. Will take a look at that in February.

@flaburgan I was talking about server to server authentication, not client to server.

Hm, if the new pod is able to know that the user who wants to import data is the one who is connect on the old pod, the problem is solved, isn’t it ?

How do you prove to the other pods?

Jonne, I’m not sure we need such rigorous authentication if we’re talking about a two-step process, as follows:

  1. User exports their entire account data to a local file on their machine.
  2. User creates account on new pod and imports account data file to new account.

Only the authenticated user of the original account will be able to export the data from that account, and if they’re downloading it to their local machine before uploading it to the new pod, unless they do something stupid, the file won’t get into third-party hands.

If we’re talking about direct transfer of account data from one pod to another pod, then yes, secure keys will be needed.

@flaburgan i looked into it, and while Persona would be our choice for SSO i don’t think it’s what we need here. i’m not going to rip out our entire authentication system so we can add one little feature.

@jonneha an alternative idea to this is tokens. say a user downloads their data from podA and wants to migrate to podB. we could embed a token inside the archive in a JSON somewhere, then before an upload happens on podB, the server checks the token (like a checksum) against podA’s /user/:id/token endpoint and makes sure they match. if they don’t, that archive didn’t come from podA and it’s an impostor. this allows podB to “make sure” the user in question is actually coming from podA. it proves identity, and IMHO building a system where identity doesn’t matter is just opening the floodgates for spam and other social engineering.

i don’t think most people are observant enough to notice jonne@givemeyourcreditcardmotherfucker.com is not the same as jonne@social.mrxyx.com. Especially with the exact same profile, exact same pic, exact same status updates. It would be less obvious if it was something like jonne@social.mrzyx.co vs. jonne@social.mrzyx.com

@goob again, there’s no way to prove that said User is in fact the User from their originating pod. I personally think identity matters, and identity isn’t just your login@pod_host.com…

My aim would be a seamless migration, lets say we have bob@podA who friendes alice@podB and wants to move to bob@podC. I’d like to have the contact for bob and alice on podB change without any action from alice. This could be done by sending a “I’m here now!”-kind of message from podC to podB. The only way for podC to prove that he now owns bob@podA is signing that message with bobs existing private key, podB already has bob@podA’s public key and can thus verify it and update all records.

@tomscott I think you’re missing the point of what I said.

There is currently nothing to stop me from opening an account on a pod under the name Tom Scott, and pretending to be you. However, what I won’t be able to do under the scheme I outlined is to import your account data from your existing account to the new fake account I have set up, because I won’t have access to the account data. Only you (and, perhaps, your podmin) have access to those data, so only you will be able to create a new account and import the data from your existing account. I suppose a simple key could be added to the data export, which the new pod could then check with the old pod to confirm that this is a genuine export from the old pod.

@jonneha if you think a seamless transfer is feasible, that would be far preferable. It may prove to be the easier way of transferring all the connections with other accounts. The only thing I’d add to what you said is that from a privacy point of view it might be wise to notify Alice (and others) that

‘Your contact Bob has moved his account to podC. Click here to confirm that you are happy to continue sharing with him at this new pod, or click here if you wish to stop sharing with Bob’.

Reason being that Alice might feel she can’t trust the podmin of podC and doesn’t want anything to do with it, nor for any of her postings to be potentially available to that podmin through her sharing with Bob. There may be no logical reason for her to feel this way, but Diaspora seems to be all about being up-front with exactly what’s happening with your data, so it would, I think, be good to notify a person’s contacts when that person moves pods, just to make people aware of what is happening and allow them to make the choice of whether to continue sharing.

I suppose, when someone starts the process of migrating to another pod, a pop-up could be shown them to notify all their contacts ‘Hi, just to let you know I’m moving to podC. This won’t affect our contact and you don’t need to do anything’ so that Bob’s contacts know in advance.

@goob i see what you’re saying. i think we’re on the same page. :slight_smile:

what do you guys think is easier though, a seamless transfer from pod->pod where we have to test and code every step of the process, or allowing humans to do the downloading and uploading of data, as long as the server(s) maintain the authorization process. i’m thinking the latter would be much easier to implement and just as secure as if we had pods take care of the “whole shebang”

I think the seamless transfer Jonne is talking about is best as an ultimate aim, but suspect there is a lot of coding and checking to do on the way to that. I think a two-step manual export/import could work well as a stop-gap measure, allowing those who really want to do so to migrate pods sooner than later. (I’d like to, as I want to set up my own pod, and this is one of the things which is stopping me.)

If the manual two-step process, perhaps with an authentification key as part of the export package, is easy to implement and can be done soon, I’d say let’s implement that as an interim measure, and work on the seamless migration in the longer term.

My use of the word seamless was entirely scoped to setting up/maintaining existing relationships. IMO a migration is a transfer, a move, not a duplication of a account and leaving the old one abandoned. As I basically outlined in my over a year old but still standing proposal. The way the archive is transferred is more of a implementation detail, it could be download/upload, POST to special route of old pod, generating a URL and leaving it on the old pod, entering the URL on the new pod, all of the above, whatever else. One benefit of my approach (given the archive is downloadable) is that it could serve as a backup, since the deletion of the old account is tried but not required for the process to complete and thus there’s no point where the old pod needs to be still reachable.

The data exporting is needed regardless of whether we allow it to be uploaded :slight_smile: But of course we already have that. Just saying.

Jonne, with the two-step download/upload method, there could be a step built in to the activation process of the new account at which, having had chance to check that everything has uploaded properly and is working as it should, the user is prompted to click a button to fully activate the new account (ie it was in a testing mode before that). At this point, the key is sent back to the old pod to instigate closing the old account, if the user hasn’t already done that. This should solve the problem of duplicate account lingering all over the place. I hope.

Jason, at the moment you can only export a very limited range of your account data. At least, that was still true the last time I tried it.

@jasonrobinson is right about the account export options. we got photos and the profile, not sure if we have posts yet but it doesn’t matter.

i say since that’s already shipped we work off that, and if we choose to we can expand the export option(s) in the future. but i think that’s enough data down there to experiment with building a pod->pod migration option.

@jonneha one issue with your data export model is, wouldn’t we need to eventually write the archive to the local disk, and then POST it to an endpoint somewhere else? If so, I have a number of questions about how we’re gonna do it:

  1. A lot of web servers aren’t configured to handle uploads of such a large size (given a reasonable amount of photos these archives could get rather large quickly). Would POSTing somewhere around 50mb of data to an endpoint on a pod require any sort of custom server configuration?
  2. How would Heroku users handle this upgrade? Heroku can’t write files to the disk, so how would we export to Heroku pods?

IMO it seems that making users manually manage the upload/download process would be a lot easier and still retain the level of security we need. API calls that check a token for validity would be all we need to make sure the “right” user is trying to establish the “right” account on a different pod.

I could imagine a solution using STUN to puch a connection through any possible firewall or NAT and the servers involved connecting directly and exchanging as much data as they want over random ports they negotiate over the normal HTTPS protocol.
(A little like FTP, with a control connection and a separate data link)

Of course … that’s a LOT of work

Proposal: Import pod data from the Export archive

Only registered users can do this off of their Account page. At this time, we will require migrating users to register on the new pod, export their data off the old pod, and then import it onto the new one. The data import feature will be simple, and basically just read in the JSON and photos to the new user’s account.

You literally just upload JSON (or however we’re storing it) and photos, and we take care of the rest.


Outcome: N/A

Votes:

  • Yes: 6
  • Abstain: 0
  • No: 0
  • Block: 0

Note: This proposal was imported from Loomio. Vote details, some comments and metadata were not imported. Click here to view the proposal with all details on Loomio.