Skip to content

Add TCP Fast Open support#1136

Open
oerdnj wants to merge 2 commits into
libuv:v1.xfrom
oerdnj:tcp_fastopen
Open

Add TCP Fast Open support#1136
oerdnj wants to merge 2 commits into
libuv:v1.xfrom
oerdnj:tcp_fastopen

Conversation

@oerdnj

@oerdnj oerdnj commented Nov 12, 2016

Copy link
Copy Markdown
Contributor

This is a preliminary effort to add TCP Fast Open support to libuv. It compiles, but it might be completely broken at the moment.

I am submitting this as PR to get a feedback if this is heading into right direction, or it should be rewritten in a completely different way.

@oerdnj oerdnj force-pushed the tcp_fastopen branch 2 times, most recently from 875270b to ba1bd82 Compare July 15, 2017 10:55
@NawarA

NawarA commented Jul 26, 2017

Copy link
Copy Markdown

Is this still open? Should be reviewed and closed or merged

@saghul saghul self-requested a review July 26, 2017 20:11
saghul
saghul previously requested changes Jul 29, 2017

@saghul saghul left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, I'm sorry it fell through the cracks. Made a first review pass. We need a test too.

Comment thread src/unix/stream.c Outdated
Comment thread src/unix/stream.c Outdated
Comment thread src/unix/stream.c Outdated
Comment thread src/unix/stream.c Outdated
Comment thread src/unix/tcp.c Outdated
Comment thread src/unix/tcp.c
Comment thread src/unix/tcp.c
Comment thread src/win/tcp.c Outdated
Comment thread docs/src/tcp.rst Outdated
@oerdnj oerdnj force-pushed the tcp_fastopen branch 9 times, most recently from 5bbdc44 to 2631e4c Compare August 2, 2017 11:57
@oerdnj oerdnj changed the title WIP: Add TCP Fast Open support Add TCP Fast Open support Aug 2, 2017
@sebdeckers

Copy link
Copy Markdown

LGTM

@saghul Any thoughts?

@ewindisch ewindisch left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good! I'm really excited for this, this will significantly benefit code I maintain (@iopipe/iopipe).

There's conflicts still and love to get an answer on the TODOs. Thanks!

Comment thread test/echo-server.c
Comment thread src/unix/stream.c Outdated
Comment thread src/unix/stream.c Outdated
Comment thread src/unix/stream.c Outdated
Comment thread src/unix/stream.c Outdated
Comment thread src/unix/tcp.c Outdated
Comment thread test/test-tcp-fastopen.c Outdated
Comment thread test/test-tcp-fastopen.c Outdated
Comment thread test/test-tcp-fastopen.c Outdated
Comment thread Makefile.am
Comment thread docs/src/tcp.rst
Comment thread include/uv-unix.h Outdated
@oyyd

oyyd commented Oct 21, 2018

Copy link
Copy Markdown
Contributor

Hi, @oerdnj. Great work and it seems stalled now. Would you mind me picking this up and push forward basing on your work?

@oerdnj

oerdnj commented Oct 21, 2018

Copy link
Copy Markdown
Contributor Author

@oyyd Sure, I would be delighted if somebody would finish the work into mergeable state. Thanks!

@oyyd

oyyd commented Mar 3, 2019

Copy link
Copy Markdown
Contributor

@patrickkh7788 No. I found it was a lot more complicated than imagined :-(

@oerdnj

oerdnj commented Mar 3, 2019 via email

Copy link
Copy Markdown
Contributor Author

@oerdnj oerdnj changed the base branch from v1.x to master March 3, 2019 16:47
@oerdnj

oerdnj commented Mar 3, 2019

Copy link
Copy Markdown
Contributor Author

JFTR The MSG_FASTOPEN is not available on macOS (although it defines TCP_FASTOPEN, and the test is broken because of that, so I have to check how to enable TFO on macOS yet.

@edgesite

edgesite commented Mar 4, 2019

Copy link
Copy Markdown

@oerdnj macOS use connectx with flags CONNECT_DATA_IDEMPOTENT to indicate a socket use TFO.
See http://www.manpagez.com/man/2/connectx/ and
https://github.com/facebook/folly/blob/95ceac6552694ee1dc4f6f289a408aa9eeabc262/folly/detail/SocketFastOpen.cpp#L68

@stale stale Bot removed the stale label Apr 8, 2020
Comment thread src/unix/stream.c
do {
n = sendto(uv__stream_fd(stream), iov[0].iov_base, iov[0].iov_len,
MSG_FASTOPEN, (const struct sockaddr*)addr, addrlen);
} while (n == -1 && (RETRY_ON_WRITE_ERROR(errno) || errno == EINPROGRESS));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EINPROGRESS has to be handled differently than re-trying sendto() again (which only results in EALREADY).

I think when EINPROGRESS is returned, it should be handled similarly to when it happens in uv__tcp_connect(). The code as is doesn't seem to be able to handle subsequent writes to the stream until the connection is established.

@stale

stale Bot commented May 6, 2020

Copy link
Copy Markdown

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale Bot added the stale label May 6, 2020
@stale stale Bot removed stale labels Jul 28, 2020
@vtjnash vtjnash dismissed saghul’s stale review July 28, 2020 13:32

Looks like these have been addressed

@stale

stale Bot commented Aug 21, 2020

Copy link
Copy Markdown

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale Bot added the stale label Aug 21, 2020
@stale stale Bot closed this Jun 2, 2021
@vtjnash vtjnash reopened this Jun 2, 2021
@stale stale Bot removed the stale label Jun 2, 2021
@stale

stale Bot commented Jun 26, 2021

Copy link
Copy Markdown

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale Bot added the stale label Jun 26, 2021
@sharadraju

sharadraju commented Jan 25, 2024

Copy link
Copy Markdown

Any updates on this PR?
This will be very useful for us to add support for TCP Fast Open in Node.js. We are looking for TCP Fast Open support for our node-oracledb driver for Oracle Database.

@bnoordhuis

Copy link
Copy Markdown
Member

My recollection (it's been a while) is that this PR is unlikely to get merged anytime soon because:

  1. TFO is effectively Linux-only at the moment, and

  2. Even on Linux it's not always usable/enabled/etc.

@vcunat

vcunat commented Jan 25, 2024

Copy link
Copy Markdown
Contributor

Libraries not supporting a feature because it's not commonly used? Isn't that a circular argument?

@oerdnj

oerdnj commented Jan 25, 2024

Copy link
Copy Markdown
Contributor Author

I’ve seen the discussion about TFO for njs, so I guess I can rebase this on top of current v1.x and try to get this merged again.

@sharadraju

Copy link
Copy Markdown

I’ve seen the discussion about TFO for njs, so I guess I can rebase this on top of current v1.x and try to get this merged again.

Great! Thanks @oerdnj . That would be really helpful!

@bnoordhuis

Copy link
Copy Markdown
Member

Libraries not supporting a feature because it's not commonly used? Isn't that a circular argument?

No, because that's not the argument. Libuv is a platform abstraction layer. If Linux is the only platform that implements TFO, then it makes no sense to support that in libuv because there's nothing to abstract.

The fact that it doesn't even always work on Linux is just icing on the cake.

@sreguna

sreguna commented Jan 25, 2024

Copy link
Copy Markdown

I see that Windows support is also available through the ConnectEx API. https://learn.microsoft.com/en-us/windows/win32/api/Mswsock/nc-mswsock-lpfn_connectex
TCP_FASTOPEN can be enabled on the socket.
https://learn.microsoft.com/en-us/windows/win32/winsock/ipproto-tcp-socket-options

@saghul

saghul commented Jan 25, 2024

Copy link
Copy Markdown
Member

TIL! This doesn't sound very promising, however:

Note that to make use of fast opens, you should use [ConnectEx](https://learn.microsoft.com/en-us/windows/desktop/api/Mswsock/nc-mswsock-lpfn_connectex) to make the initial connection, and specify data in that function's lpSendBuffer parameter to be transferred during the handshake process. Some of the data in lpSendBuffer will be transferred under the Fast Open protocol.

So seentially we could only support that if you call uv_write before uv_connect actually does the connecting... sounds pretty clunky to use IMHO.

@oerdnj

oerdnj commented Jan 25, 2024

Copy link
Copy Markdown
Contributor Author

TIL! This doesn't sound very promising, however:

Note that to make use of fast opens, you should use [ConnectEx](https://learn.microsoft.com/en-us/windows/desktop/api/Mswsock/nc-mswsock-lpfn_connectex) to make the initial connection, and specify data in that function's lpSendBuffer parameter to be transferred during the handshake process. Some of the data in lpSendBuffer will be transferred under the Fast Open protocol.

So seentially we could only support that if you call uv_write before uv_connect actually does the connecting... sounds pretty clunky to use IMHO.

There are already places with delayed syscalls, aren’t they? We can delay the connect system call until the first write happens.

@saghul

saghul commented Jan 25, 2024

Copy link
Copy Markdown
Member

There are already places with delayed syscalls, aren’t they? We can delay the connect system call until the first write happens.

That could work!

@bnoordhuis

Copy link
Copy Markdown
Member

We can delay the connect system call until the first write happens.

Can you elaborate on how that would work? I can parse that sentence in multiple ways, none which seem feasible or desirable.

@saghul

saghul commented Jan 26, 2024

Copy link
Copy Markdown
Member

I'd assume it means to do something like:

h = uv_tcp_init(...);
uv_tcp_fastopen(h, 1);
uv_connect(...)
uv_write(...)
# connect would happen when uv_write is called, on Windows at least

@oerdnj

oerdnj commented Jan 26, 2024

Copy link
Copy Markdown
Contributor Author

I'd assume it means to do something like:

h = uv_tcp_init(...);
uv_tcp_fastopen(h, 1);
uv_connect(...)
uv_write(...)
# connect would happen when uv_write is called, on Windows at least

Well, in fact, the PR already does that on Linux. It just have been so long, that I've forgotten what I did here :).

@bnoordhuis

Copy link
Copy Markdown
Member

Yeah, that was one of the possible interpretations. :-)

Okay, I see a couple of problems with that:

  1. undesirable: also needs to special-case connect + read; more special cases == more ugly

  2. undesirable: breaks code that only starts writing from uv_connect_cb; makes uv_tcp_fastopen() not a drop-in replacement

  3. undesirable: delaying the connect() call regresses handshake performance

IMO, it'd be best to go in the write-before-connect direction. Clunky but no other undesirable attributes.

@oerdnj

oerdnj commented Jan 26, 2024

Copy link
Copy Markdown
Contributor Author

Yeah, that was one of the possible interpretations. :-)

Okay, I see a couple of problems with that:

  1. undesirable: also needs to special-case connect + read; more special cases == more ugly

The only special case needed would be that you need to start writing (uv_write()) before you start reading (uv_read_start()), otherwise TFO would be ineffective.

Note: uv_try_write() should always fail in not connected yet, so we can process the connect failures.

  1. undesirable: breaks code that only starts writing from uv_connect_cb; makes uv_tcp_fastopen() not a drop-in replacement

Why? The uv_connect() would call the callback immediately (without calling system connect()), so in fact, it would be a drop in replacement.

  1. undesirable: delaying the connect() call regresses handshake performance

Can you elaborate a bit more? I don't understand what you mean here.

IMO, it'd be best to go in the write-before-connect direction. Clunky but no other undesirable attributes.

@bnoordhuis

Copy link
Copy Markdown
Member

Pretty much all downstream code currently is structured like this:

uv_connect(connect_req, handle, [](...) {
  uv_write(...);
  // or:
  uv_read_start(...);
});

That won't work if the connect req doesn't do anything until the first uv_read_start() or uv_write().

Apropos:

undesirable: delaying the connect() call regresses handshake performance

Example: Node's http connection pool opens a bunch of connections to the host but doesn't do anything with them until there's a request ready to be sent.

At the moment uv_tcp_connect() calls connect() straight away. The kernel performs the TCP handshake in the background. The connections are ready by the time they're needed.

With your proposal, the connect() call isn't done until the request is sent. That adds latency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.