Same exact problem here to access a dedicated server at the online.net datacenter.
Theres no problem after a reboot, no need to change MTU, ssh connection works for 1-3 weeks, then appears this exact same bug , blocking on KEXINIT, no more possible to connect the ssh server.
It could be some kind of sshd bug, but its necessarily triggered by some nework stuff happening after 1-3 weeks, I reproduced this exact problem many times with many different servers on this network, some say it could be related to a cisco bug, possibly related with some DPI options.
That problem never happened with other servers I manage in other datacenters, and that have the exact same distro, config and sshd version .
if you dont want to reboot every 10 days because the datacenter firewalls ( or other network tweaks ) is doing weird stuff :
first connect with one of those client side workarounds :
workaround 1, lowering your local, client side MTU :
ip li set mtu 1400 dev wlan0
( 1400 should be enough but you can try to use lower values if needed )
workaround 2, specifying the chosen cypher for the ssh connection :
ssh -c aes256-gcm@openssh.com host
(or try with any another available cypher )
Both of those client side workarounds made it for me, I could connect and save my uptime; but you want to fix this server-side, forever, so you dont have to ask every client to locally tweak their MTU.
On gentoo i just added :
mtu_eth0="1400"
in /etc/conf.d/net
( same mtu option should be available somewhere in your preferred distro network config file )
I ve set the mtu to 1400, but 1460 is probably enough in most cases.
another helping workaround could be to use the following iptables rules to manage fragmentation :
# /sbin/iptables -I OUTPUT -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
# /sbin/ip6tables -I OUTPUT -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
( but I personaly didnt need this one until now )
also note that the symptoms of this problem can also be :
debug1: SSH2_MSG_KEXINIT sent
not just
debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY
edit march 2016 :
lowering the mtu to 1400 on the server most always work, but I recently had the case where mtu was already lowered to 1400 on the server and the problem reappeared, and the client also had to lower mtu to 1400.
The problem also appeared on web login forms waiting for the page to reload until saying "the server have reset the connection", also fixed after the client set the mtU to 1400.
related links :
https://bugs.launchpad.net/ubuntu/+source/openssh/+bug/1254085
http://www.held.org.il/blog/2011/05/the-myterious-case-of-broken-ssh-client-connection-reset-by-peer/
https://nowhere.dk/articles/natty-narwhal-problems-connecting-to-servers-behind-cisco-firewalls-using-ssh
https://stackoverflow.com/questions/2419412/ssh-connection-stop-at-debug1-ssh2-msg-kexinit-sent
http://www.1-script.com/forums/ssh/ssh-hang-after-ssh2-msg-kexinit-sent-10616-.htm
http://www.snailbook.com/faq/mtu-mismatch.auto.html
SSH2_MSG_KEX_DH_GEX_REPLY) happens much earlier in the connection. – u1686_grawity Dec 08 '10 at 13:46BTW GUYS, the problem has been resolved by itself. I didn't anything just tried to log in and I was successful. hah
– bakytn Dec 08 '10 at 16:01http://serverfault.com/questions/592059/debug1-expecting-ssh2-msg-kex-dh-gex-group/697350#697350
– dgaavl Jun 08 '15 at 10:49tun-mtu 1492mssfix 1400push tun-mtu 1492push mssfix1400the reason for is, that you dont need to take care about any client configuration with this.. Sadly this question is IMHO, really important and i humbled on this during a business case intervention. And ive digged into this deeper, duringssh -vvvsays connection is there, using tcpdump and wireguard told me the direction. Reducing the MTU can only be done by Minus 8 if still having trouble – djdomi Mar 15 '24 at 14:22