Gaia astroquery 1.0#2077
Conversation
keflavich
left a comment
There was a problem hiding this comment.
I have a few stylistic change requests, but otherwise I think this is in good shape.
|
@mhsarmiento - you could please also rebase this and address the review? |
e642c9d to
efa491d
Compare
efa491d to
2581020
Compare
f1857e8 to
3c70940
Compare
|
Hello @mhsarmiento! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-02-24 14:23:14 UTC |
Codecov Report
@@ Coverage Diff @@
## main #2077 +/- ##
==========================================
+ Coverage 62.97% 63.08% +0.11%
==========================================
Files 131 131
Lines 16922 17008 +86
==========================================
+ Hits 10656 10729 +73
- Misses 6266 6279 +13
Continue to review full report at Codecov.
|
bsipocz
left a comment
There was a problem hiding this comment.
It appears that I had a review started a while ago but it hasn't been submitted yet, I'm sorry for the delay.
There are a couple of comments, most of them are minor.
| Created on 30 jun. 2016 | ||
|
|
||
| Modified on 1 jun. 2021 by mhsarmiento | ||
| Version: gaia-astroquery-1.0 |
There was a problem hiding this comment.
where is this version number coming from? For astroquery this will land in either 0.4.6/7 or 4.0, depending on how quickly we overhaul our infrastructure. In fact I think this type of metadata at the top of the files are not very useful, as the full git history (available on a per-file bases if needed) is available, and is allways more accurate.
|
|
||
| def rename_table(self, table_name=None, new_table_name=None, new_column_names_dict={}, | ||
| verbose=False): | ||
| """ This new method allows you to update the column names of a user table. |
There was a problem hiding this comment.
No need to say "new" in the docstring as that line will stay with the method for years to come.
There was a problem hiding this comment.
Change applied. Thanks!
|
|
||
| Parameters | ||
| ---------- | ||
| table_name: str, required |
There was a problem hiding this comment.
required keywords are usually not noted in the docstring.
There was a problem hiding this comment.
Change applied. Thanks!
| TapPlus.rename_table.rename_table(table_name=old_table_name, new_table_name=new_table_name, | ||
| new_column_names_dict=[old_column1:new_column1, old_column2:new_column2, ...]) | ||
| """ | ||
| return TapPlus.rename_table(self, table_name=table_name, new_table_name=new_table_name, |
There was a problem hiding this comment.
Maybe leave a comment in the code that notes that the validity of the required keywords are checked in TapPlus.rename_table itself, and therefore was omitted from this method.
| args = {} | ||
|
|
||
| if table_name is None: | ||
| raise ValueError("Table name cannot be null") |
There was a problem hiding this comment.
I would rephrase to make it an absolutely clear error message by mentioning the name of the kwarg. Also, saying out explicitly that providing a value is mandatory rather than saying null is more helpful for for the users. Thanks!
There was a problem hiding this comment.
Change applied. Thanks!
| context=None, | ||
| body=tableData, | ||
| headers=None) | ||
| # tableRequest = f"tables?tables={tableName}" |
There was a problem hiding this comment.
Please don't have commented out code. Either remove the comments if they are needed/useful test, or remove them altogether.
There was a problem hiding this comment.
Suggested change applied. Thanks!
| # data = connHandler.url_encode(dictArgs) | ||
| # response = connHandler.execute_table_tool(data, verbose=verbose) | ||
|
|
||
| with pytest.raises(Exception): |
There was a problem hiding this comment.
Would it be possible to check for the specific exceptions? These would catch and pass with any exceptions, whether it's the expected one or not.
There was a problem hiding this comment.
Not sure yet which type of exceptions are risen here. I have already open an issue on my side to improve this part of the code in future versions. Thank you very much for pointing this out.
There was a problem hiding this comment.
Having an issue open in the astroquery repo may be useful, too, so we see what has come up already.
There was a problem hiding this comment.
Thanks, I will take into account
| It is possible to choose which data release to query, by default the Gaia DR2 catalogue is used. For example: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| >>> Gaia.MAIN_GAIA_TABLE = "gaiadr2.gaia_source" # Select Data Release 2, default | ||
| >>> Gaia.MAIN_GAIA_TABLE = "gaiaedr3.gaia_source" # Select early Data Release 3 | ||
|
|
There was a problem hiding this comment.
This example was added after we received plenty of user comment about the default release, is there a particular reason that it should be removed?
(Even when the default changes from DR2, I think it's useful to have an example that shows how to change it to use a different DR)
There was a problem hiding this comment.
Sorry for this. Was due to a problem during the rebase. I have restored the example.
| >>> from astroquery.utils.tap.core import TapPlus, TAP_CLIENT_ID | ||
| >>> from astroquery.utils.tap import taputils | ||
| >>> gaia = GaiaClass(gaia_tap_server='https://<env>.esac.esa.int/', gaia_data_server='https://<env>.esac.esa.int/') | ||
| >> >gaia.login() |
There was a problem hiding this comment.
minor typo
| >> >gaia.login() | |
| >>> gaia.login() |
There was a problem hiding this comment.
Suggested change applied
| >>> from astroquery.utils.tap.model.tapcolumn import TapColumn | ||
| >>> from astroquery.utils.tap.core import TapPlus, TAP_CLIENT_ID | ||
| >>> from astroquery.utils.tap import taputils | ||
| >>> gaia = GaiaClass(gaia_tap_server='https://<env>.esac.esa.int/', gaia_data_server='https://<env>.esac.esa.int/') |
There was a problem hiding this comment.
what is <env>? I don't see mentioned anywhere above (unlike user_<user_login_name>). I think an example would be useful.
There was a problem hiding this comment.
Example updated.
Again thank you very much for your time and suggestions.
eerovaher
left a comment
There was a problem hiding this comment.
I have a few comments and suggestions, but also a few questions.
| if not output_file.endswith(compressed_extension): | ||
| warnings.warn('WARNING!!! By default, results in "votable" and "fits" format are returned in ' | ||
| f'compressed format therefore your file {output_file} ' | ||
| f'will be renamed to {output_file}.gz') |
There was a problem hiding this comment.
Is the fact that votable and fits files are compressed a part of the TAP+ protocol or is it specific to the Gaia archive?
There was a problem hiding this comment.
It is from the TAP+ protocol. By default certain type of output formats are returned in compressed format
There was a problem hiding this comment.
If it's a feature of the TAP+ protocol then it would be better to open a separate pull request for implementing this in the TapPlus class.
There was a problem hiding this comment.
Sorry, I'm not sure if I understand here what do you want me to do. TAP+ returning a compressed format is not something that has been implemented now.
The problem here was that when a user gives a name, for example "myVotable.vot" for the file and he/she does not expect that the file is going to be provided in compressed format and then he/she tries to open the file, the user is going to get and error because the file is compressed. In order to avoid that, we were requested to rename the file with the correct extension if the user didn't provided that at first 'myVotable.vot.gz', so they don't get that error when opening the file.
There was a problem hiding this comment.
If the votable and fits files are compressed only by the Gaia archive then this code belongs here. But from your previous answer I got the impression that those file types being compressed is required by the TAP+ standard, and if that is the case then this code needs to go to the TapPlus class so that all astroquery modules connecting to TAP+ services could benefit from it.
There was a problem hiding this comment.
It required by TAP+. I can move it to TapPlus for the next implementation but we have dealing with this pull request since past June and I/we would really appreciate if we can move on with this. Specially if as you have suggested, we need to do a different pull request for the TAP+ modifications. I will take this also into account from now on.
There was a problem hiding this comment.
This pull request would implement many improvements to astroquery.gaia, but at least a few of them seem to be independent from the others. The problem with such a large pull request is that it can only be merged all at once, so if any of the proposed improvements requires some more work then merging all the other improvements has to be delayed too. Opening a separate pull request for each independent improvement would preferable.
There was a problem hiding this comment.
Dear @eerovaher , @mhsarmiento ,
I think there is a misunderstanding here. Considering previous comments, what you are proposing is that we include that warning as a check in the launch_job method from TAPPlus class, so the rest of our modules could benefit from it. Is that correct? And that this improvement should be included in a different pull request, am I wrong?
If that is the case, I propose this:
- On Monday, I will extract this piece of code.
- I will create a new branch from astroquery main to include only this feature.
- After testing it and checking that it works for all ESA modules, I will create the pull request. As this is something simple (just a few lines), it should be merged fast.
- Once it is accepted and in astroquery main branch, we will rebase this PR branch to include it and remove the code from Gaia module.
What do you think?
Have a nice weekend!
There was a problem hiding this comment.
My understanding based on the answers above is that changing file name extensions of compressed formats would be useful for all TAP+ services, and in that case it should be implemented in the TapPlus class to avoid code repetition. But I am not familiar with the TAP+ standard so it is possible I have misunderstood something. If I am mistaken and this code is useful only for the Gaia archive then it should stay where it is.
If the code in question does belong in TapPlus then I would say it would have been better to open a separate pull request. But the code is already included here, so opening another pull request at this point might not be worth the effort. It is best to ask @keflavich, @bsipocz or @ceb8 what they think.
As this is something simple (just a few lines), it should be merged fast.
I do not have write access to this repository, so I cannot make any promises about when this or any other pull request gets merged, but I would expect that smaller and simpler pull requests get merged faster.
| upload_table_name=upload_table_name, | ||
| autorun=autorun) | ||
|
|
||
| def rename_table(self, table_name=None, new_table_name=None, new_column_names_dict={}, |
There was a problem hiding this comment.
I don't see why this method exists at all. GaiaClass is a subclass of TapPlus, so it could simply inherit the rename_table method from there.
There was a problem hiding this comment.
Suggested change applied. Thanks!
| self.__connPortSsl = sslport | ||
| if server_context is not None: | ||
| if(server_context.startswith("/")): | ||
| if (server_context.startswith("/")): |
There was a problem hiding this comment.
| if (server_context.startswith("/")): | |
| if server_context.startswith("/"): | |
There was a problem hiding this comment.
Suggested change applied. Thanks!
| def __create_context(self, context): | ||
| if (context is not None and context != ""): | ||
| if(str(context).startswith("/")): | ||
| if (str(context).startswith("/")): |
There was a problem hiding this comment.
| if (str(context).startswith("/")): | |
| if str(context).startswith("/"): | |
There was a problem hiding this comment.
Suggested change applied. Thanks!
| "tapclient": str(TAP_CLIENT_ID), | ||
| "QUERY": str(query), | ||
| "UPLOAD": ""+str(uploadValue)} | ||
| "UPLOAD": "" + str(uploadValue)} |
There was a problem hiding this comment.
| "UPLOAD": "" + str(uploadValue)} | |
| "UPLOAD": str(uploadValue)} | |
| if key == '': | ||
| raise ValueError(f" Old column name introduced "f"{key}"f" cannot be empty") |
There was a problem hiding this comment.
Instead of checking if key == '' wouldn't it be better to check if key is actually present in the table? Or is that not possible?
| if key == value: | ||
| raise ValueError(f" Old column name introduced "f"{key}"f" and new column name introduced " | ||
| f" "f"{key}"f" cannot be the same") |
There was a problem hiding this comment.
Wouldn't it be better to check if the new column name is already in the table or not? Or is that not possible?
| raise ValueError(f" Old column name introduced "f"{key}"f" and new column name introduced " | ||
| f" "f"{key}"f" cannot be the same") | ||
| if value == "": | ||
| raise ValueError(f" New column name introduced "f"{value}"f" cannot be empty") |
There was a problem hiding this comment.
| raise ValueError(f" New column name introduced "f"{value}"f" cannot be empty") | |
| raise ValueError(f"New name of {key} cannot be an empty string") | |
| new_pair = key + ":" + value | ||
| new_column_names = new_column_names + new_pair | ||
| count = count + 1 | ||
| if count < len(new_column_names_dict): | ||
| new_column_names = new_column_names + ',' |
There was a problem hiding this comment.
Assuming that the preceding code has verified all the data then the following would be more pythonic:
| new_pair = key + ":" + value | |
| new_column_names = new_column_names + new_pair | |
| count = count + 1 | |
| if count < len(new_column_names_dict): | |
| new_column_names = new_column_names + ',' | |
| new_column_names = ','.join(f'{key}:{value}' for key, value in new_column_names_dict.items()) |
There was a problem hiding this comment.
Thank you very much. Thank to you guys I'm learning and improving my knowledge of python. Thanks!
| } | ||
| return args | ||
|
|
||
| def get_args_4_rename_table_only_columns(self, table_name, new_column_names_dict): |
There was a problem hiding this comment.
The only difference between this function and get_args_4_rename_table_all() is that this one does not include "new_table_name" in args. I don't see why it would be desirable to implement two functions that duplicate each other to such an extent.
There was a problem hiding this comment.
Suggested change applied
…opy#2077 and implements ticket GAIAPCR-1040:C9APP-149 Astropy error when working with tables downloaded in plain votable, fits, json, and ecsv formats
|
|
||
| try: | ||
| result = APTable.read(io.BytesIO(gzip.decompress(data.read())), format=astropy_format) | ||
| except gzip.BadGzipFile: |
There was a problem hiding this comment.
We still support python3.7, and BadGzipFile has only been added into 3.8, so a workaround is needed here. I see two ways, but maybe there are others (I don't have a preference of which one you do):
-
- except on
OSErrorasBadGzipFileinherits for that one. Based on the docs,EOFErrororzlib.errorcould be options, too.
- except on
-
Add a custom exception based on the python version. Somewhat ugly, but it is the closest to what you're checking right now, and would allow a trivial cleanup once 3.7 support is dropped.
try:
# We need python > 3.7 for this
from gzip import BadGzipFile
except ImportError:
class BadGzipFile(OSError):
pass
There was a problem hiding this comment.
Thank you very much. Working on this right now
There was a problem hiding this comment.
change applied. thank you again for the suggestion. very useful
| elif 'votable_plain' == output_format: | ||
| return 'ascii' |
There was a problem hiding this comment.
Would this be a simple ascii table, or votable format? If the latter why not use votable?
There was a problem hiding this comment.
The reason for this was that we detected that when working with a JupyterNotebook, if you did a request with output_format 'votable_plain' and you wanted to display it as astropy table, it crashed because astropy did't recognise the format. By converting the format 'votable_plain' to 'ascii' the issue is solved.
There was a problem hiding this comment.
Can you provide a minimal code example?
There was a problem hiding this comment.
Sorry, but with type of example? Do you want the JupiterNotebook with the error returned by astropy table if we don't do that conversion? Honestly I would not like to rollback the code just to show this but I have to, I will do it of course.
There was a problem hiding this comment.
This is what I tried with the development version of astroquery:
from astropy import units as u
from astropy.coordinates import SkyCoord
from astroquery.gaia import Gaia
Gaia.cone_search(SkyCoord(0*u.deg, 0*u.deg), radius=0.1*u.deg, columns=['ra', 'dec'], output_format='votable_plain')The outcome was:
IORegistryError: No reader defined for format 'votable_plain' and class 'Table'.Is this what you were talking about? If it is then wouldn't the correct replacement format for 'votable_plain' be 'votable' instead of 'ascii'?
There was a problem hiding this comment.
Yes, just tested and it works as well. The reason because we change it to 'ascii' was that it was the format proposed by the error given back by Astropy. (Here is an example)
~/opt/anaconda3/envs/gaia-astroquery-1.0/lib/python3.8/site-packages/astropy/io/registry/core.py in get_reader(self, data_format, data_class)
141 else:
142 format_table_str = self._get_format_table_str(data_class, 'Read')
--> 143 raise IORegistryError(
144 "No reader defined for format '{}' and class '{}'.\n\nThe "
145 "available formats are:\n\n{}".format(
IORegistryError: No reader defined for format 'votable_plain' and class 'Table'.
The available formats are:
Format Read Write Auto-identify Deprecated
--------------------------- ---- ----- ------------- ----------
ascii Yes Yes No
ascii.aastex Yes Yes No
ascii.basic Yes Yes No
ascii.cds Yes No No
ascii.commented_header Yes Yes No
ascii.csv Yes Yes Yes
ascii.daophot Yes No No
ascii.ecsv Yes Yes Yes
ascii.fast_basic Yes Yes No
ascii.fast_commented_header Yes Yes No
ascii.fast_csv Yes Yes No
ascii.fast_no_header Yes Yes No
ascii.fast_rdb Yes Yes No
ascii.fast_tab Yes Yes No
ascii.fixed_width Yes Yes No
ascii.fixed_width_no_header Yes Yes No
ascii.fixed_width_two_line Yes Yes No
ascii.html Yes Yes Yes
ascii.ipac Yes Yes No
ascii.latex Yes Yes Yes
ascii.mrt Yes Yes No
ascii.no_header Yes Yes No
ascii.qdp Yes Yes Yes
ascii.rdb Yes Yes Yes
ascii.rst Yes Yes No
ascii.sextractor Yes No No
ascii.tab Yes Yes No
asdf Yes Yes Yes
fits Yes Yes Yes
hdf5 Yes Yes Yes
pandas.csv Yes Yes No
pandas.fwf Yes No No
pandas.html Yes Yes No
pandas.json Yes Yes No
parquet Yes Yes Yes
votable Yes Yes Yes
aastex Yes Yes No Yes
cds Yes No No Yes
csv Yes Yes No Yes
daophot Yes No No Yes
html Yes Yes No Yes
ipac Yes Yes No Yes
latex Yes Yes No Yes
mrt Yes Yes No Yes
rdb Yes Yes No Yes
So checking the issue with the support archive scientist we thought that perhaps the closest format from the suggested ones was 'ascii'. Of course 'votable' works so no problem to change it to 'votable' if you consider that it's a better solution. Thank you in advance
There was a problem hiding this comment.
Of course 'votable' works so no problem to change it to 'votable' if you consider that it's a better solution.
That sounds good to me and is also what @bsipocz suggested in #2077 (comment).
There was a problem hiding this comment.
Hmm, interesting that the votable_plain returned something that was readable as an ascii table. Afaik a votable is xml-based, so I'm a bit puzzled. OTOH, it's a server-side naming of the format, so whatever works for reading in that output is fine.
…opy#2077 and implements ticket GAIAPCR-1040:C9APP-149 Astropy error when working with tables downloaded in plain votable, fits, json, and ecsv formats
|
One of the pulls/rebases picked up unrelated changes, and I see a handful of duplicated commits, too (usually a side effect of not force pushing a rebased branch back to remote but pulling it again). Anyway, this should be very close to being finish, so I go ahead and do a rebase to get rid of these issues and will come back and reasses CI and merge if all is clear. |
…opy#2077 and implements ticket GAIAPCR-1040:C9APP-149 Astropy error when working with tables downloaded in plain votable, fits, json, and ecsv formats
98c36c1 to
dccf61a
Compare
|
@mhsarmiento - I've rebased and removed the duplicated commits, and now it became quite visible that the first few commits were made by someone who didn't set their email address to be recognized by github. Please let me know if you would like to fix commits by |
bsipocz
left a comment
There was a problem hiding this comment.
This can be merged once my minor last question about contributor git history credits are answered.
ef0b9d2 to
7a932ea
Compare
development now it is possible to rename the name of the user table and/or its columns
corrects the pythest for the new funcionality "rename_table"
…opy#2077 and implements ticket GAIAPCR-1040:C9APP-149 Astropy error when working with tables downloaded in plain votable, fits, json, and ecsv formats
7a932ea to
379c76c
Compare
Updated file CHANGES.rst with the new enhancements added to the current version.
- Rename the output file name given by the user with the correct extension (By default, fits and votable results are returned in compressed format) - Improves the help for methods 'launch_job' and 'launch_job_async'. The help for the parameter 'name' was missing in those methods.
…information for method 'gaia_load' data.
…opy#2077 and implements ticket GAIAPCR-1040:C9APP-149 Astropy error when working with tables downloaded in plain votable, fits, json, and ecsv formats
gzip.BadGzipFile with python 3.7. Now except OSError is being used as a workaround
…tropy_format', the corresponding return type for 'votable_plain' has been changed to 'votable'
379c76c to
c353e00
Compare
Dear @bsipocz ,I have already fix the commits by |
|
Thank you @mhsarmiento! |
…opy#2077 and implements ticket GAIAPCR-1040:C9APP-149 Astropy error when working with tables downloaded in plain votable, fits, json, and ecsv formats
this new pull request contain a new enhancement for the packages 'utils.tap' and 'Gaia.' This new enhancements includes the capability to rename a user table and/or its columns.