Stopping indexing of profile pages


(goob) #1

Profile pages are, as I understand it, supposed not to be indexed by search engines & crawlers. However, this is not what’s happening.

Profile pages are being indexed by some crawlers at least (Google for example) - not the content of the pages, but the existence of the profile and its URL - and clicking on the URL takes you to the profile page, even if you’re not logged in.

If I put “Diaspora HQ” into Google, I get:

Diaspora HQ
https://joindiaspora.com/u/diasporahq
A description for this result is not available because of this site’s robots.txt – learn more.

The same happens for my own profile page.

So things are a bit confused. Either we want search engines to index profile pages properly, or we don’t want them to index them at all.

The robots.txt disallow for /u/ and /people/ seems to stop the crawlers from indexing the content of the profile pages, but not their existence and URL, and it’s easy then to view the content.

Would adding

to the header of each profile page prevent the crawlers from indexing profile pages completely? If so, could it easily be coded so that this line is added to profile pages whenever they’re created?


Note: This discussion was imported from Loomio. Click here to view the original discussion.


(Erwan Guyader) #2

As I said here , I believe that public profiles SHOULD be indexed as people posting public posts probably want them to get some visibility.

However, I would agree with implementing a setting to make a profile private. If this is activated, the profile page shouldn’t be accessible to anybody not sharing with that person.


(goob) #3

I think a possible solution would be to have an option in the user settings: ‘Allow your public profile to be indexed by search engines?’ - but wanted just to raise the issue of what seems to be a discrepancy at the moment.


(Jason Robinson) #4

It would be nice to have a setting for profile being public and private. We kind of have the “allow searching on Diaspora*” thing but not sure what exactly it controls. Maybe just modify that to allow public profiles more visibility on search engines, and private ones less.


(Flaburgan) #5

This setting has to be added.

(By the way, this was the object of the discussion I created, linked by Erwan below)


(goob) #6

@jasonrobinson That would be the ideal situation, but I can foresee it being difficult to implement, because at the moment there is one robots.txt file in the root of each pod, and if this disallows /u/ and /people/, this is going to apply to both public and private profiles - unless it was altered so that public profiles were put in directories called /pu/ and /ppeople/, but this could get messy.

I think at the moment it would be good to make a decision either to allow complete indexing or no indexing of profile pages, because at the moment the situation probably pleases no one - people who want a public profile will want it properly indexed; people who want a private profile will want it not indexed at all.

@flaburgan your discussion was about public posts, this is similar but different, because it relates to the robots.txt file and indexing of profiles.


(Jason Robinson) #7

@goob wasn’t talking about robots.txt - why not just force login for those profiles that are not public - and then check if user can see it (=in contacts, etc).


(jonsger) #8

btw:
I checked Diaspora HQ and other diaspora users at a few search engines. I came to the result that DuckDuckGo, Bing and Yandex.ru don’t index profile pages or don’t show the profile pages in the results.