Language filter or automated translations

We should detect the language of a post (also give an option for user to manually specify) and allow everyone to filter content based on languages they know.

We suggested it as a google summer of code idea and one student is interested in this. We have one mentor who knows ror but if someone from the community can also support this, it would be awesome.

The discussions happened so far http://lists.smc.org.in/pipermail/student-projects-smc.org.in/2014-March/000076.html
http://wiki.smc.org.in/SoC/2014/Project_ideas#Language_filter_for_diaspora


Note: This discussion was imported from Loomio. Click here to view the original discussion.

1 Like

I won’t find time to mentor anything this year, but of course I’m available for questions in #diaspora-dev @ Freenode, usually in the evening hours CET.

Hey,
Nice to know that you are ready to help me for this GSoC project.

I have gone through the schema of Diaspora project and feel that this feature of language preference for a user can be added in the user_preferences model.
Also for tagging a post, the acts-as-taggable-on gem can be used(for filtering posts too).
For translation I plan on using the globalize gem.
I have sent a detailed version of this idea in the form of a mail in the above mentioned mailing list.
Kindly review this idea as I would like to know if I’m on the right track for this feature.

A brief list of action points to implement Language filter for Diaspora:

1)Add the new column called languages_preferred to the users table instead of user_preferences.
2)Tag a post using the add-as-taggable-on feature with the language it is written in by detecting it through a gem that has local detection and does not depend on any external services.
3)At the receiver’s side, filter the incoming post by looking up his/her language preferences. This also ensures that there is no breach in security or the protocol used to federate the posts.
4)If necessary translation of posts(or comments) can also be done using the globalize gem.
5)A UI has to be integrated for every user to be able to add/edit his/her language preferences.

1 Like

Sounds good, except I’m not sure where you’re going at with globalize. Translation of posts would be an independent feature to me that needs to be discussed separately (and I don’t think it’s feasable/needed).

Hey,
The translation part of a post or comment is just an idea… i guess it should be thought about and discussed after adding the language filter module completely.

I think translation of posts and comments is an important feature. It is already implemented in loomio. We can have an option for automatic translation or manual translation.

Also we need to integrate jquery.ime to input non-latin languages later.

Both of these (especially translation) sound to me as though they would be quite heavy on server load.

Hey
I have verified that the gems(act-as-taggable-on, whatlanguage) are used in local environment and do not have any external dependencies. Further, the language detection is done only on the receiving pod.
May I know which feature(under your opinion) will increase the server load ?
Thank you for your time.

Hi,

I and Praveen are having some discussions about my proposal for this project .Praveen wants

“Also having an option for manual translations would be good too
(especially since there is a limitation in automatic translation and
also many Indian languages are missing from automatic translations).
Users should be able to request translation of a post and others
should be able to translate the posts. It would be good to have a
"Requested translations stream” for each user and each language (for
those who volunteer to translate)."

To which my reply was:

“The option for manual translations, and an interface to manually translate the posts would be trivial to implement.
Use “globalize” gem to have separate translation tables for user generated content along with the regular UI constants and UI Messages. And, push the untranslated user-generated content into a separate to-be-translated stream, and the translations are requested from the volunteer users. I can include a more detailed technical overview of the feature, and some UI mockups in my application.”

Please give me any valuable feedback regarding the same.
Thank you.

Following is the link to my current proposal :-

http://wiki.smc.org.in/Project_Proposal_:Language_Filter_for_diaspora_by_Abhineet_Agarwal

I am making the changes in proposal accordingly.
Any suggestions or feedback regarding any other feature will be of great help too.

Maybe language detection (and translation?) could be done client-side? Are there any JavaScript libraries out there? Or maybe some sort of third-party online service could be used?

I still heavily oppose mixing these two together. Language detection and filtering is an already discussed topic and I sense a global consensus for it.

However post translation is a completely independent feature, that has very high implications on user experience and the daily usage of diaspora. On a technical level it also has quite a huge impact on the federation protocol.

It just makes no sense at all to mix these together and I like to see much more throughout discussion in our community about post translation before I see time spent on implementation details of how to do it justified.

1 Like

Due to the disadvantages of translation_tables in globalize, I have decided to replace the globalize_gem with other language translation gems(which
rather use APIs like Google or Bing). I have hereby listed out a few of the gems that I have explored:

  1. to_lang ( https://github.com/jimmycuadra/to_lang )
  2. easy_translate ( https://github.com/seejohnrun/easy_translate )
  3. language-translator

I am sure that apart from these gems, external API calls can also be made for the same.

There are many client side language translators as well supported by jQuery (using Google API). However these might be 3rd party softwares and can cause concerns related to security or robustness of Diaspora.

I am not completely sure as to if these would relieve the server from load, of course translation can be kept as a secondary idea for implementation after implementing the language detector feature and testing the same rigorously. Any feedback for this suggestion ?

I agree with what @jonnehass said.
As the initial step the focus should be on detecting the language. That is not a small feature by itself.

Meanwhile we can keep thinking about a way to implement translation that is practical, but also aligns with security and privacy concerns that form the base of what Diaspora stands for.

Those are two separate features and should not be munged into one task.

1 Like

Hey,
Yes, the initial focus of this GSoC project will be the implementation of language detection and tagging relevant posts(as planned in the previous comments). However there will be a parallel discussion on how to integrate the translation feature as well.

I’m against a automatic translation through a third party API.
If I’m interested, I can always paste the content into translate.google.com on myself without sending data to it. I dislike the exposing …

Hi, is there any news on this project ? I feel like it would be very useful to be able to filter posts that are not comprehensible to the user.

1 Like

How about allowing user to specify known languages in their profile so at least it could be used to exclude from the streams the posts of the other users who don’t have any language in common.

For example if I choose english and french and someone else has defined english and italian, I might still see posts in intalian from that person if we use the same hastag, but at least I wouldn’t see posts from somebody who has defined dutch, german and spanish as favorite language.

This is far from being perfect (unless you only select one language), but still it would improve the streams without using a third party API (like google translate) which is a problem for some people and I suppose it shouldn’t be a very complex development ?

I would like to have option to filter languages other than Finnish and English as I don’t understand other languages.

Currently I can unaspect people who mainly write in languages I don’t understand, but there are still followed hashtags which aren’t restricted to one specific language as either the same word exists in both languages or everyone just uses the word instead of whatever the word for their language is.