Update download URLs for SDSS DR18#3017
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3017 +/- ##
==========================================
- Coverage 67.60% 67.30% -0.31%
==========================================
Files 233 231 -2
Lines 18271 18280 +9
==========================================
- Hits 12352 12303 -49
- Misses 5919 5977 +58 ☔ View full report in Codecov by Sentry. |
|
Hello @weaverba137! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2024-06-28 00:05:38 UTC |
|
This is ready for review. Before merging there is an important question to consider: The I've been informed that in DR19 there will be a unified way to access all SDSS/BOSS/eBOSS/eFEDS spectra in a single table, so I propose possibly documenting the Summary of changes:
I can squash commits if necessary. There was a bit of churn getting tests to work. |
bsipocz
left a comment
There was a problem hiding this comment.
These all look great, than you so much for the fix.
Two things, neither of which I would hold this PR up for:
- Can/should we change the default DR to 18?
- Running the remote tests I got a different exception for
test_query_timeout
================================================================================= FAILURES =================================================================================
____________________________________________________________________ TestSDSSRemote.test_query_timeout _____________________________________________________________________
self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x11bd9d430>, conn = <urllib3.connection.HTTPSConnection object at 0x11b2b0b00>, method = 'GET'
url = '/dr17/en/tools/search/x_results.aspx?cmd=+SELECT%0D+p.ra%2C+p.dec%2C+p.objid%2C+p.run%2C+p.rerun%2C+p.camcol%2C+p.fie...8%28%28p.ra+BETWEEN+2.02319+AND+2.02374%29+AND+%28p.dec+BETWEEN+14.8395+AND+14.8401%29%29%29&format=csv&searchtool=SQL'
timeout = Timeout(connect=0.01, read=0.01, total=None), chunked = False
httplib_request_kw = {'body': None, 'headers': {'User-Agent': 'astroquery/0.4.8.dev9345_testrun_testrun Python/3.12.1 (Darwin) python-reque...=HCNLOJOANGIBHJGAJMLLPOKI; ASPSESSIONIDSQSCCQRQ=ICNLOJOAOBLOGAIACNDFMJJD; ASP.NET_SessionId=nlotaiz2ednb1acriy14pjms'}}
timeout_obj = Timeout(connect=0.01, read=0.01, total=None), read_timeout = 0.01
def _make_request(
self, conn, method, url, timeout=_Default, chunked=False, **httplib_request_kw
):
"""
Perform a request on a given urllib connection object taken from our
pool.
:param conn:
a connection from one of our connection pools
:param timeout:
Socket timeout in seconds for the request. This can be a
float or integer, which will set the same timeout value for
the socket connect and the socket read, or an instance of
:class:`urllib3.util.Timeout`, which gives you more fine-grained
control over your timeouts.
"""
self.num_requests += 1
timeout_obj = self._get_timeout(timeout)
timeout_obj.start_connect()
conn.timeout = Timeout.resolve_default_timeout(timeout_obj.connect_timeout)
# Trigger any extra validation we need to do.
try:
self._validate_conn(conn)
except (SocketTimeout, BaseSSLError) as e:
# Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
raise
# conn.request() calls http.client.*.request, not the method in
# urllib3.request. It also calls makefile (recv) on the socket.
try:
if chunked:
conn.request_chunked(method, url, **httplib_request_kw)
else:
conn.request(method, url, **httplib_request_kw)
# We are swallowing BrokenPipeError (errno.EPIPE) since the server is
# legitimately able to close the connection after sending a valid response.
# With this behaviour, the received response is still readable.
except BrokenPipeError:
# Python 3
pass
except IOError as e:
# Python 2 and macOS/Linux
# EPIPE and ESHUTDOWN are BrokenPipeError on Python 2, and EPROTOTYPE is needed on macOS
# https://erickt.github.io/blog/2014/11/19/adventures-in-debugging-a-potential-osx-kernel-bug/
if e.errno not in {
errno.EPIPE,
errno.ESHUTDOWN,
errno.EPROTOTYPE,
}:
raise
# Reset the timeout for the recv() on the socket
read_timeout = timeout_obj.read_timeout
# App Engine doesn't have a sock attr
if getattr(conn, "sock", None):
# In Python 3 socket.py will catch EAGAIN and return None when you
# try and read into the file pointer created by http.client, which
# instead raises a BadStatusLine exception. Instead of catching
# the exception and assuming all BadStatusLine exceptions are read
# timeouts, check for a zero timeout before making the request.
if read_timeout == 0:
raise ReadTimeoutError(
self, url, "Read timed out. (read timeout=%s)" % read_timeout
)
if read_timeout is Timeout.DEFAULT_TIMEOUT:
conn.sock.settimeout(socket.getdefaulttimeout())
else: # None or a value
conn.sock.settimeout(read_timeout)
# Receive the response from the server
try:
try:
# Python 2.7, use buffering of HTTP responses
httplib_response = conn.getresponse(buffering=True)
except TypeError:
# Python 3
try:
httplib_response = conn.getresponse()
except BaseException as e:
# Remove the TypeError from the exception chain in
# Python 3 (including for exceptions like SystemExit).
# Otherwise it looks like a bug in the code.
> six.raise_from(e, None)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:467:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:462: in _make_request
httplib_response = conn.getresponse()
../../../.pyenv/versions/3.12.1/lib/python3.12/http/client.py:1419: in getresponse
response.begin()
../../../.pyenv/versions/3.12.1/lib/python3.12/http/client.py:331: in begin
version, status, reason = self._read_status()
../../../.pyenv/versions/3.12.1/lib/python3.12/http/client.py:292: in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
../../../.pyenv/versions/3.12.1/lib/python3.12/socket.py:707: in readinto
return self._sock.recv_into(b)
../../../.pyenv/versions/3.12.1/lib/python3.12/ssl.py:1253: in recv_into
return self.read(nbytes, buffer)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <ssl.SSLSocket [closed] fd=-1, family=2, type=1, proto=0>, len = 8192, buffer = <memory at 0x11ba80100>
def read(self, len=1024, buffer=None):
"""Read up to LEN bytes and return them.
Return zero-length string on EOF."""
self._checkClosed()
if self._sslobj is None:
raise ValueError("Read on closed or unwrapped SSL socket.")
try:
if buffer is not None:
> return self._sslobj.read(len, buffer)
E TimeoutError: The read operation timed out
../../../.pyenv/versions/3.12.1/lib/python3.12/ssl.py:1105: TimeoutError
During handling of the above exception, another exception occurred:
self = <requests.adapters.HTTPAdapter object at 0x11bb29e20>, request = <PreparedRequest [GET]>, stream = False, timeout = Timeout(connect=0.01, read=0.01, total=None)
verify = True, cert = None, proxies = OrderedDict()
def send(
self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None
):
"""Sends PreparedRequest object. Returns Response object.
:param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
:param stream: (optional) Whether to stream the request content.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple or urllib3 Timeout object
:param verify: (optional) Either a boolean, in which case it controls whether
we verify the server's TLS certificate, or a string, in which case it
must be a path to a CA bundle to use
:param cert: (optional) Any user-provided SSL certificate to be trusted.
:param proxies: (optional) The proxies dictionary to apply to the request.
:rtype: requests.Response
"""
try:
conn = self.get_connection(request.url, proxies)
except LocationValueError as e:
raise InvalidURL(e, request=request)
self.cert_verify(conn, request.url, verify, cert)
url = self.request_url(request, proxies)
self.add_headers(
request,
stream=stream,
timeout=timeout,
verify=verify,
cert=cert,
proxies=proxies,
)
chunked = not (request.body is None or "Content-Length" in request.headers)
if isinstance(timeout, tuple):
try:
connect, read = timeout
timeout = TimeoutSauce(connect=connect, read=read)
except ValueError:
raise ValueError(
f"Invalid timeout {timeout}. Pass a (connect, read) timeout tuple, "
f"or a single float to set both timeouts to the same value."
)
elif isinstance(timeout, TimeoutSauce):
pass
else:
timeout = TimeoutSauce(connect=timeout, read=timeout)
try:
> resp = conn.urlopen(
method=request.method,
url=url,
body=request.body,
headers=request.headers,
redirect=False,
assert_same_host=False,
preload_content=False,
decode_content=False,
retries=self.max_retries,
timeout=timeout,
chunked=chunked,
)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/requests/adapters.py:486:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:799: in urlopen
retries = retries.increment(
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/util/retry.py:550: in increment
raise six.reraise(type(error), error, _stacktrace)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/packages/six.py:770: in reraise
raise value
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:715: in urlopen
httplib_response = self._make_request(
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:469: in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <urllib3.connectionpool.HTTPSConnectionPool object at 0x11bd9d430>, err = TimeoutError('The read operation timed out')
url = '/dr17/en/tools/search/x_results.aspx?cmd=+SELECT%0D+p.ra%2C+p.dec%2C+p.objid%2C+p.run%2C+p.rerun%2C+p.camcol%2C+p.fie...8%28%28p.ra+BETWEEN+2.02319+AND+2.02374%29+AND+%28p.dec+BETWEEN+14.8395+AND+14.8401%29%29%29&format=csv&searchtool=SQL'
timeout_value = 0.01
def _raise_timeout(self, err, url, timeout_value):
"""Is the error actually a timeout? Will raise a ReadTimeout or pass"""
if isinstance(err, SocketTimeout):
> raise ReadTimeoutError(
self, url, "Read timed out. (read timeout=%s)" % timeout_value
)
E urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='skyserver.sdss.org', port=443): Read timed out. (read timeout=0.01)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/urllib3/connectionpool.py:358: ReadTimeoutError
During handling of the above exception, another exception occurred:
self = <astroquery.sdss.tests.test_sdss_remote.TestSDSSRemote object at 0x11bdcc980>
def test_query_timeout(self):
with pytest.raises(ConnectTimeout):
> sdss.SDSS.query_region(self.coords, width=2.0 * u.arcsec, cache=False, timeout=self.mintimeout)
astroquery/sdss/tests/test_sdss_remote.py:187:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
astroquery/utils/class_or_instance.py:25: in f
return self.fn(obj, *args, **kwds)
astroquery/utils/process_asyncs.py:26: in newmethod
response = getattr(self, async_method_name)(*args, **kwargs)
astroquery/sdss/core.py:387: in query_region_async
return self.query_sql_async(sql_query, timeout=timeout,
astroquery/sdss/core.py:622: in query_sql_async
response = self._request("GET", url, params=request_payload,
astroquery/query.py:368: in _request
response = query.request(self._session, stream=stream,
astroquery/query.py:77: in request
return session.request(self.method, self.url, params=self.params,
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/requests/sessions.py:589: in request
resp = self.send(prep, **send_kwargs)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/requests/sessions.py:703: in send
r = adapter.send(request, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <requests.adapters.HTTPAdapter object at 0x11bb29e20>, request = <PreparedRequest [GET]>, stream = False, timeout = Timeout(connect=0.01, read=0.01, total=None)
verify = True, cert = None, proxies = OrderedDict()
def send(
self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None
):
"""Sends PreparedRequest object. Returns Response object.
:param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
:param stream: (optional) Whether to stream the request content.
:param timeout: (optional) How long to wait for the server to send
data before giving up, as a float, or a :ref:`(connect timeout,
read timeout) <timeouts>` tuple.
:type timeout: float or tuple or urllib3 Timeout object
:param verify: (optional) Either a boolean, in which case it controls whether
we verify the server's TLS certificate, or a string, in which case it
must be a path to a CA bundle to use
:param cert: (optional) Any user-provided SSL certificate to be trusted.
:param proxies: (optional) The proxies dictionary to apply to the request.
:rtype: requests.Response
"""
try:
conn = self.get_connection(request.url, proxies)
except LocationValueError as e:
raise InvalidURL(e, request=request)
self.cert_verify(conn, request.url, verify, cert)
url = self.request_url(request, proxies)
self.add_headers(
request,
stream=stream,
timeout=timeout,
verify=verify,
cert=cert,
proxies=proxies,
)
chunked = not (request.body is None or "Content-Length" in request.headers)
if isinstance(timeout, tuple):
try:
connect, read = timeout
timeout = TimeoutSauce(connect=connect, read=read)
except ValueError:
raise ValueError(
f"Invalid timeout {timeout}. Pass a (connect, read) timeout tuple, "
f"or a single float to set both timeouts to the same value."
)
elif isinstance(timeout, TimeoutSauce):
pass
else:
timeout = TimeoutSauce(connect=timeout, read=timeout)
try:
resp = conn.urlopen(
method=request.method,
url=url,
body=request.body,
headers=request.headers,
redirect=False,
assert_same_host=False,
preload_content=False,
decode_content=False,
retries=self.max_retries,
timeout=timeout,
chunked=chunked,
)
except (ProtocolError, OSError) as err:
raise ConnectionError(err, request=request)
except MaxRetryError as e:
if isinstance(e.reason, ConnectTimeoutError):
# TODO: Remove this in 3.0.0: see #2811
if not isinstance(e.reason, NewConnectionError):
raise ConnectTimeout(e, request=request)
if isinstance(e.reason, ResponseError):
raise RetryError(e, request=request)
if isinstance(e.reason, _ProxyError):
raise ProxyError(e, request=request)
if isinstance(e.reason, _SSLError):
# This branch is for urllib3 v1.22 and later.
raise SSLError(e, request=request)
raise ConnectionError(e, request=request)
except ClosedPoolError as e:
raise ConnectionError(e, request=request)
except _ProxyError as e:
raise ProxyError(e)
except (_SSLError, _HTTPError) as e:
if isinstance(e, _SSLError):
# This branch is for urllib3 versions earlier than v1.22
raise SSLError(e, request=request)
elif isinstance(e, ReadTimeoutError):
> raise ReadTimeout(e, request=request)
E requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='skyserver.sdss.org', port=443): Read timed out. (read timeout=0.01)
../../../.pyenv/versions/3.12.1/lib/python3.12/site-packages/requests/adapters.py:532: ReadTimeout
========================================================================= short test summary info ==========================================================================
FAILED astroquery/sdss/tests/test_sdss_remote.py::TestSDSSRemote::test_query_timeout - requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='skyserver.sdss.org', port=443): Read timed out. (read timeout=0.01)
====================================================================== 1 failed, 681 passed in 45.06s ======================================================================
|
Other things:
Sounds totally reasonable to wait. Could you open a reminder issue for this though?
Yes, it would be great to squash the iterations of tests/style a little bit. |
|
@bsipocz when you got the different exception above, could you describe what environment you were using? It's probably sufficient to just list the Python and requests version. |
|
I see the ReadTimeout on py3.12 and request 2.31 but also when upgraded to the latest 2.32, but also can reproduce it with the tox environment |
|
I don't think we've quite converged on a consistent set of exceptions to use, but everything else should be done at this point. |
|
I used |
|
I still see the failure, even on the latest requests version |
|
@bsipocz I wonder if your network is so fast that you are seeing a timeout after a connection is established, while I am seeing a timeout before a connection is established. In any case, I think the correct exception is actually: |
|
OK, please test with |
|
Now it all works. I don't think I have a particularly fast internet, but it may indeed routed differently than if you were testing from a university network. Anyway, the suggested solution is correct, we shouldn't really care what type of timeout one runs into. |
|
Maybe a rebase to get rid of the merge and squash some of the back-and-forths? |
|
Great! You should be able to squash when you merge, according to what I've read recently about merging on GitHub. |
|
We don't do squash merges here as it doesn't fit into the workflow, but I'm happy to do the rebase/squash prior merging. |
|
That's OK, I'll go ahead with the squash now. It's good practice for me. |
Bikeshed, but if possible don't squash down to a single commit, it was more work that that, as well as there were more logical chunks :) |
c3a0546 to
10885b2
Compare
|
I fixed conflicts in CHANGES.rst 3 or 4 times. If you want to squash further, I'll leave that to you. |
I'm not sure what went wrong, but the first commit here, that is the squashed everything, is clearly contains unrelated changes that have been picked up from |
|
Thank you @weaverba137! |
This PR closes #3011.
Background: SDSS-V has changed the paths used to access data files. Both imaging and spectroscopic data are affected. As of SDSS-V/DR18, the SQL access does not appear to have changed. Updating these URLs may continue to work throughout SDSS-V, but that is not guaranteed. The SDSS-V data team recommends using sdss-access.
platenumbers do work.astroquery.sdssever return a SDSS-V spectrum?fieldID-MJD-fiberIDinstead ofplate-MJD-fiberID.