(two year bump!)
I hope this isn’t tl;dr so please be patient and stick with it
So, Sean’s original question asked,
“What can we fix in our own implementation right now?”
One thing I see asked about and discussed countless times is the federation retry issue.
Our Wiki states:
Will a pod eventually receive federated posts that it misses while being offline/down?
Possibly. We retry the delivery three times at one hour intervals.
#WTF?!
We only try to resend a message/information/etc three times at one hour intervals? THREE TIMES? AT ONE HOUR INTERVALS?!??
No wonder there are so many questions and complaints about posts being missed
Now here on Loomio there are countless (literally, I didn’t count them there are that many) about some huge changes which could be made to make the federation more reliable and they are indeed fantastic - but the reality is they are very long term goals in terms of implementation.
What I would like to suggest is a very simple change to the retry functionality.
Once an hour, for only three hours, is completely unrealistic in todays Diaspora ecosystem. So many pods, so many different connection types. You only need look at podupti.me to see just how much actual downtime there actually is.
I would like to see the retry intervals not only be made more frequent in the short term, but the longevity of the retries massively increased - to be something more like the average SMTP protocol in terms of re-trying to deliver the message. The SMTP RFC 5321 states:
Retries continue until the message is transmitted or the sender gives up; the give-up time generally needs to be at least 4-5 days
Why on earth do we give up after just three hours?
Is there a technical reason why we couldn’t (easily!) implement message delivery retries along the lines of:
- Retry every 5 mins for six attempts (30 mins)
- Then retry every 1 hour for six attempts (6 hrs)
- Then retry every 3 hours for four attempts (12hrs)
- Then retry every 6 hours for four attempts (24hrs)
- Then retry every 12 hours for two attempts (24hrs)
- Then retry every 24 hours for one attempts (24hrs)
I’ve just pulled these numbers out of the air, there is no science behind them and they are simply a starting point for discussion