Missions mast#2095
Conversation
|
Hello @syed-gilani! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2022-02-24 03:44:33 UTC |
There was a problem hiding this comment.
First, I recommend you change this to a WIP PR until this service gets deployed to ops at STScI, as it cannot be added to astroquery before then.
I've mentioned a few specifics in line, but before I really go through it in detail I want to address the larger picture items that I think need to be addressed.
First, I think we may have had a misunderstanding with regards to the overall design of the new functionality. I am confused as to why the user-facing class is called "Datasets" and the non-user-facing class is called "MissionMast". My understanding had been that the user-facing class would be astroquery.mast.MissionMast or similar, because that is more descriptive than "Datasets" and also matches the nomenclature we are using for the mission-mast interface itself.
In terms of the MissionsSearchAPI class, it is not clear to me why this cannot be folded into the existing ServiceAPI, the structure of the url looks pretty similar, and it would be preferable to generalize the ServiceAPI to also function for the mission interfaces if possible.
For the DatasetsClass, this interface does not follow the API specs required by astroquery. The functions should be along the lines of those provided by the Observations class, "query_object", "query_region", "query_criteria" etc. Also the functions should take advantage of the **kwargs option in python so that the user can supply filters as individual arguments rather than a single json dictionary.
Additionally, while you've called this "Datasets" generally, it only accesses the HST mission search, so there needs to be a way for the user to select the specific mission to search within, and a way to easily expand the code for future mission searches.
For the documentation, you've added the new search capabilities in the middle of the Observations section. There should be a new section called "Mission Searches" or whatever, and it needs to explain more clearly why a user would be interested in these searches and give links to the relevant STScI documentation.
Codecov Report
@@ Coverage Diff @@
## main #2095 +/- ##
==========================================
+ Coverage 62.84% 62.93% +0.08%
==========================================
Files 130 131 +1
Lines 16841 16917 +76
==========================================
+ Hits 10584 10646 +62
- Misses 6257 6271 +14
Continue to review full report at Codecov.
|
ceb8
left a comment
There was a problem hiding this comment.
Overall the functionality is quite good, although the code needs some tweaking.
The mocked tests look good. Remoted tests will also be necessary.
The docs still reflect the old Datasets functionality, so they need to be updated. Also the description of this PR needs to be updated to reflect what's in it now.
Just a suggestion, but since this is part of a larger MAST effort it might be a good idea to make an issue with the list of things MAST plans to implement and checky boxes so that users can see what's coming and track progress.
| col_type = np.int64 | ||
| ignore_value = -999 if (ignore_value is None) else ignore_value | ||
| elif col_type == "double" or col_type == "float" or col_type == "DECIMAL": | ||
| elif col_type == "double" or col_type.lower() == "float" or col_type == "DECIMAL": |
There was a problem hiding this comment.
I wonder if it would be safer to make all of our comparisons in this function case-insensitive.
There was a problem hiding this comment.
The to_lower() here addresses this one case. I'm guessing the comment was intended to be broader, perhaps getting the lower case version of col_type at the beginning of the loop, then having all the comparisons be against lower case, e.g., "varchar", "null", "decimal".
There was a problem hiding this comment.
I do still worry about all the variations in capitalization for col_type. It would be nice if we were consistent on the server, but that's not the case. Creating or pointing to documentation for the possible values I think could really help. I also think treating all values as lowercase here would be a little safer.
There was a problem hiding this comment.
I don't see a trace of discussion about this review comment, neither changes to the code. Please don't click "resolve conversation" without actually resolving it. If discussion happened offline, then leave a summary of it at the relevant place. Thanks!
There was a problem hiding this comment.
OK, I see the to lower has been moved to the top of the loop, so I am going to mark this conversation as resolved.
tomdonaldson
left a comment
There was a problem hiding this comment.
Thanks for this work. I think it's looking quite good, but I did have a few questions/comments.
Since I'm unfamiliar with the existing code relating to catalogs, I'd also like to do a deeper dive to explore some of the usage and test coverage, but I didn't want to delay these initial comments for that.
| col_type = np.int64 | ||
| ignore_value = -999 if (ignore_value is None) else ignore_value | ||
| elif col_type == "double" or col_type == "float" or col_type == "DECIMAL": | ||
| elif col_type == "double" or col_type.lower() == "float" or col_type == "DECIMAL": |
There was a problem hiding this comment.
The to_lower() here addresses this one case. I'm guessing the comment was intended to be broader, perhaps getting the lower case version of col_type at the beginning of the loop, then having all the comparisons be against lower case, e.g., "varchar", "null", "decimal".
| ignore_value = None | ||
| # making type adjustments | ||
| if col_type == "char" or col_type == "STRING": | ||
| if col_type == "char" or col_type == "STRING" or 'VARCHAR' in col_type or col_type == "NULL": |
There was a problem hiding this comment.
It could be useful to have a brief comment (maybe in the doc string?) explaining the possible values of col_type and which service(s) they might come from.
There was a problem hiding this comment.
please do address this, a comment in the code would be sufficient I think
| .. code-block:: python | ||
|
|
||
| >>> from astroquery.mast import Datasets | ||
| >>> params = {"target": ["40.66963 -0.01328"], |
There was a problem hiding this comment.
The documentation seems a little incomplete to me. As a small example, this code sample doesn't actually do a query. In general as a user, I'd be looking for the following things:
- At a high level, how do the missions searches differ from the others (i.e., when might I choose to use the missions search?).
- What queries are available to me?
- What do each of the keys in the params dict do? (And do astroquery users need to know about all the params shown here?)
- What is the syntax for specifying "conditions"?
- (A pointer to) documentation about what columns are available, along with their types so I know what conditions would be applicable.
- Can I do a follow-up query to retrieve data products?
|
@tomdonaldson - maybe you want to ping Erik yet again to add @syed-gilani to the org, so we don't need to approve their each and every commit... |
|
Just sent an invite to @syed-gilani to the org. I did not add them to any of the teams, as I think just org membership is enough to not require the approval. But if there's a team I should add them to, I can do that. (Although I think y'all have admin powers over the astroquery teams?) |
tomdonaldson
left a comment
There was a problem hiding this comment.
Thanks for the updates. This is looking really good.
I've added some minor suggestions for the docs and code coverage.
| >>> missions.service | ||
| 'search' | ||
|
|
||
| missions object can be used to search meta data usiong region coordinates. the kwargs can be |
There was a problem hiding this comment.
| missions object can be used to search meta data usiong region coordinates. the kwargs can be | |
| The missions object can be used to search meta data using region coordinates. the kwargs can be |
|
|
||
| missions object can be used to search meta data usiong region coordinates. the kwargs can be | ||
| used to specify other filters like selec_cols and sort_by. The available column names to filter and | ||
| their description for can be found here https://vao.stsci.edu/missionmast/tapservice.aspx/tables#folder38 |
There was a problem hiding this comment.
| their description for can be found here https://vao.stsci.edu/missionmast/tapservice.aspx/tables#folder38 | |
| their descriptions can be found here https://vao.stsci.edu/missionmast/tapservice.aspx/tables#folder38 (xml) |
| 'search' | ||
|
|
||
| missions object can be used to search meta data usiong region coordinates. the kwargs can be | ||
| used to specify other filters like selec_cols and sort_by. The available column names to filter and |
There was a problem hiding this comment.
| used to specify other filters like selec_cols and sort_by. The available column names to filter and | |
| used to specify output characteristics like select_cols and sort_by. The available column names to filter and |
|
|
||
| .. code-block:: python | ||
|
|
||
| >>> results = missions.query_criteria(select_cols=["sci_stop_time", "sci_targname", "sci_start_time", "sci_status"], sort_by=['sci_targname']) |
There was a problem hiding this comment.
Was this example meant to include some constraints on column values? It would be good to see a numeric and discrete example, I think.
|
|
||
| # adding additional user specified parameters | ||
| for prop, value in kwargs.items(): | ||
| params[prop] = value |
There was a problem hiding this comment.
This seems fairly fundamental. Should there be test case(s) for params[prop] = value?
| params["radius_units"] = 'arcminutes' | ||
|
|
||
| if not self._service_api_connection.check_catalogs_criteria_params(criteria): | ||
| raise InvalidQueryError("At least one non-positional criterion must be supplied.") |
There was a problem hiding this comment.
If you're ok with missing unit test, I will add it in one of the future updates to this interface.
tomdonaldson
left a comment
There was a problem hiding this comment.
I think this looks good now. Test coverage improvements can be handled in a subsequent PR.
|
@bsipocz I think this is ready for a final look. We've discussed the remaining coverage issues and agreed to file an internal ticket to address them in a future PR. Please advise if you would also like to track that with a github issue. |
bsipocz
left a comment
There was a problem hiding this comment.
Thanks @syed-gilani! This is coming along nicely.
Some of my comments are extreme nitpicking, but there are a few ones that should be addressed along with comments left unaddressed in previous reviews from others.
| service_dict = {self.service: {'path': self.service, 'args': {}}} | ||
| self._service_api_connection.set_service_params(service_dict, f"{self.service}/{self.mission}") | ||
|
|
||
| def set_service(self, service): |
There was a problem hiding this comment.
I may be missing something, so I like to see why this, as well as set_mission was done this way, rather than handling it as part of __init__.
There was a problem hiding this comment.
I think these (both set_service() and set_mission()) are not required anymore as the service can be set at initialization.
|
@bsipocz For making all searchable fields visible in the query_criteria, I will have to add more than 50 arguments to the query_criteria method. There is an endpoint that returns the list of all columns that can be used to filter the search, This is the URL https://mast.stsci.edu/search/util/api/v0.1/column_list. I am not sure if adding more than 50 arguments to a method is a good idea |
|
@bsipocz Another point to consider is that the name of searchable fields will change from mission to mission. Right now, We only support hst mission but we are working on classy mission and it will be released soon. If we leave it as conditions then we don't have to provide multiple query_citeria for each mission with a different set of arguments |
Thanks, that's good to know and I agree that in that case it's not something practical. I suppose it would be useful to add a comment about it in the code, so if anyone else also runs into it (or uses mast as an example implementation) they know that it's more the exception for this case. Also, I wonder whether it would make sense, or just complicate things to check the validity of kwargs. As I said one of my main concerns is that invalid keywords can be passed this way which may result cases where the user assumes a certain behaviour but in fact, their input is totally ignored by the code. Addressing this goes beyond this PR. |
| service_dict = {self.service: {'path': self.service, 'args': {}}} | ||
| self._service_api_connection.set_service_params(service_dict, f"{self.service}/{self.mission}") | ||
|
|
||
| def set_service(self, service): |
There was a problem hiding this comment.
I think these (both set_service() and set_mission()) are not required anymore as the service can be set at initialization.
| ignore_value = None | ||
| # making type adjustments | ||
| if col_type == "char" or col_type == "STRING": | ||
| if col_type == "char" or col_type == "STRING" or 'VARCHAR' in col_type or col_type == "NULL": |
| col_type = np.int64 | ||
| ignore_value = -999 if (ignore_value is None) else ignore_value | ||
| elif col_type == "double" or col_type == "float" or col_type == "DECIMAL": | ||
| elif col_type == "double" or col_type.lower() == "float" or col_type == "DECIMAL": |
There was a problem hiding this comment.
I don't see a trace of discussion about this review comment, neither changes to the code. Please don't click "resolve conversation" without actually resolving it. If discussion happened offline, then leave a summary of it at the relevant place. Thanks!
|
@bsipocz All things identified in the feedback have been resolved. Thanks for all the feedback and let me know if there is anything else you want me to change before this PR can be merged. |
|
@bsipocz The last outstanding issue that required making mission and service keyword only argument was fixed a few days ago. Is there anything else that you want changed before this PR can be approved? |
| Default 3 degrees. | ||
| The string must be parsable by `~astropy.coordinates.Angle`. The | ||
| appropriate `~astropy.units.Quantity` object from | ||
| `~astropy.units` may also be used. Defaults to 0.2 deg. |
There was a problem hiding this comment.
It says here Defaults to 0.2 deg but the default is 3 correct?
| @class_or_instance | ||
| def query_criteria_async(self, **criteria): | ||
| """ | ||
| Given an set of filters, returns a list of catalog entries. |
There was a problem hiding this comment.
Agreed, I think people might get confused by what filters means in this context.
| Default 3 arcmin. | ||
| The string must be parsable by `~astropy.coordinates.Angle`. | ||
| The appropriate `~astropy.units.Quantity` object from | ||
| `~astropy.units` may also be used. Defaults to 0.2 deg. |
There was a problem hiding this comment.
The same Defaults to 0.2 deg is here
|
Hi @syed-gilani , I'll be taking over from Clara in maintaining astroquery's MAST module so I took a look through your PR and it looks like a really useful interface. I had a couple of comments and questions that you can look at before the merge. Functionally this is looking good. Thanks for your time. Jenny |
|
I started to go over this again and noticed that there are a few still unaddressed review comments (going back to all the way back to the first reviews from Clara). Some of those may not be relevant anymore, but I would find it useful if any trace of that would be left in the review comments, e.g. comments on the review asks for a Anyway, this has lingered here long enough, so let's say. If both @tomdonaldson and @jaymedina signs-off on it, I'll merge with the understanding the follow-up PRs will come that adds e.g. the tests and any more outstanding comments from this review. |
| Valid criteria are coordinates, objectname, radius (as in `query_region` and `query_object`), | ||
| and all fields listed in the column documentation for the mission being queried. | ||
| Fields that can be used to match results on criteria. See the TAP schema link below for all field names. | ||
| https://vao.stsci.edu/missionmast/tapservice.aspx/tables#folder38 |
There was a problem hiding this comment.
This is not a requirement for the PR, but @tomdonaldson, I wonder maybe there is a human-readable version of this (or a link to a tool that does that, with that XML preloaded into it)? (e.g the workflow of astroquery users to look at the XML of that link to dig up field names feels a bit clunky to me)
There was a problem hiding this comment.
I totally agree with this comment. Unfortunately we don't support a clean human-readable format for this, and I'd be hesitant to point to a 3rd-party formatter.
@syed-gilani it's not obvious to me what the #folder38 part of the URL does. Are there environments where that would limit what parts of the TAP Schema are shown?
|
This looks good. @syed-gilani can you give us an update on the results from running |
|
@syed-gilani I'm also good with these changes as they exist today, but there are conflicts with the main branch. @bsipocz I'm assuming a rebase is in order before final approval? |
|
@tomdonaldson I resolved the conflicts and also removed the #folder38 tag from the TAP schema URL |
Thanks for these updates! |
Removed the html tag from the TAP schema link
…criptions for a mission
c5788ed to
200af24
Compare
|
Thank you @syed-gilani! |
This PR includes a class that makes POST API call to https://masttest.stsci.edu/search/hst/api/v0.1/search with json data that contains the search criteria. See a sample of json data below:
{"target":[""],"radius":3,"radius_units":"arcminutes","select_cols":["ang_sep","sci_aper_1234","sci_central_wavelength","sci_data_set_name","sci_dec","sci_actual_duration","sci_spec_1234","sci_hlsp","sci_instrume","sci_preview_name","sci_pep_id","sci_ra","sci_refnum","sci_release_date","scp_scan_type","sci_start_time","sci_stop_time","sci_targname"],"user_fields":[],"conditions":[{"sci_obs_type":"all"},{"sci_aec":"S"},{"sci_instrume":"stis,acs,wfc3,cos,fgs,foc,fos,ghrs,hsp,nicmos,wfpc,wfpc2"},{"sci_data_set_name":""},{"sci_pep_id":""},{"sci_pi_last_name":""},{"sci_actual_duration":""},{"sci_spec_1234":""},{"sci_release_date":""},{"sci_start_time":""}],"limit":5000,"offset":0,"sort_by":[],"sort_desc":[],"skip_count":false}
It then converts the json response from the API call into astropy Table.