Skip to content

fix: from csv behaviour#646

Merged
JoanFM merged 13 commits into
mainfrom
fix-from-csv-behaviour
Oct 20, 2022
Merged

fix: from csv behaviour#646
JoanFM merged 13 commits into
mainfrom
fix-from-csv-behaviour

Conversation

@anna-charlotte

Copy link
Copy Markdown
Contributor

Goals:

@anna-charlotte anna-charlotte marked this pull request as draft October 18, 2022 14:24
@codecov

codecov Bot commented Oct 18, 2022

Copy link
Copy Markdown

Codecov Report

Merging #646 (cde6109) into main (1c96605) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #646      +/-   ##
==========================================
+ Coverage   86.45%   86.46%   +0.01%     
==========================================
  Files         133      133              
  Lines        6755     6761       +6     
==========================================
+ Hits         5840     5846       +6     
  Misses        915      915              
Flag Coverage Δ
docarray 86.46% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
docarray/array/mixins/io/from_gen.py 85.24% <100.00%> (+1.60%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@JoanFM

JoanFM commented Oct 19, 2022

Copy link
Copy Markdown
Member

I believe this can be solved inside from_csv itself checking if cls is an Instance or a class.

@anna-charlotte

anna-charlotte commented Oct 20, 2022

Copy link
Copy Markdown
Contributor Author

Thanks for the review, @JoanFM! I think, unfortunately, cls is always a class, even when the classmethod is called from an instance. Therefore, I had a quick chat with Johannes yesterday and he recommended me to follow the approach from jina-ai/serve#4787. Let me know if I’m missing something.

Comment thread docarray/array/mixins/io/from_gen.py Outdated
@JoanFM

JoanFM commented Oct 20, 2022

Copy link
Copy Markdown
Member

Do not add an init to the Mixin. DO that in the init of DocArray Base class

@JohannesMessner

JohannesMessner commented Oct 20, 2022

Copy link
Copy Markdown
Member

Do not add an init to the Mixin. DO that in the init of DocArray Base class

Are we sure about this? It would assign some methods that don't exist there and break separation of concerns. It seems to be accepted practice to let mixins have inits and pass **kwargs between them: https://stackoverflow.com/a/50465583

@JoanFM

JoanFM commented Oct 20, 2022

Copy link
Copy Markdown
Member

Do not add an init to the Mixin. DO that in the init of DocArray Base class

Are we sure about this? It would assign some methods that don't exist there and break separation of concerns. It seems to be accepted practice to let mixins have inits and pass **kwargs between them: https://stackoverflow.com/a/50465583

multiple inheritance is not so easily supported in Python I think. What is ur super when u call super?

@JohannesMessner

JohannesMessner commented Oct 20, 2022

Copy link
Copy Markdown
Member

Do not add an init to the Mixin. DO that in the init of DocArray Base class

Are we sure about this? It would assign some methods that don't exist there and break separation of concerns. It seems to be accepted practice to let mixins have inits and pass **kwargs between them: https://stackoverflow.com/a/50465583

multiple inheritance is not so easily supported in Python I think. What is ur super when u call super?

My understanding is that super() picks a certain evaluation order along which it tries to find an __init__ to call, and then it call the first it finds.
But if that __init__ in turn calls super().__init__(**kwargs) then it will keep 'walking' along that evaluation order. So as long as every init in a mixin makes that call it should all work as intended. The mixins just need to be set up to not break this chain.

@samsja

samsja commented Oct 20, 2022

Copy link
Copy Markdown
Member

Do not add an init to the Mixin. DO that in the init of DocArray Base class

Are we sure about this? It would assign some methods that don't exist there and break separation of concerns. It seems to be accepted practice to let mixins have inits and pass **kwargs between them: https://stackoverflow.com/a/50465583

multiple inheritance is not so easily supported in Python I think. What is ur super when u call super?

My understanding is that super() picks a certain evaluation order along which it tries to find an __init__ to call, and then it call the first it finds. But if that __init__ in turn calls super().__init__(**kwargs) then it will keep 'walking' along that evaluation order. So as long as every init in a mixin makes that call it should all work as intended. The mixins just need to be set up to not break this chain.

I agree with Johannes it would be better to add it in the Mixin __init__ but if it is not possible because of inheritance problem we can put in in the DA init

@JoanFM

JoanFM commented Oct 20, 2022

Copy link
Copy Markdown
Member

Do not add an init to the Mixin. DO that in the init of DocArray Base class

Are we sure about this? It would assign some methods that don't exist there and break separation of concerns. It seems to be accepted practice to let mixins have inits and pass **kwargs between them: https://stackoverflow.com/a/50465583

multiple inheritance is not so easily supported in Python I think. What is ur super when u call super?

My understanding is that super() picks a certain evaluation order along which it tries to find an __init__ to call, and then it call the first it finds. But if that __init__ in turn calls super().__init__(**kwargs) then it will keep 'walking' along that evaluation order. So as long as every init in a mixin makes that call it should all work as intended. The mixins just need to be set up to not break this chain.

I agree with Johannes it would be better to add it in the Mixin __init__ but if it is not possible because of inheritance problem we can put in in the DA init

Do not put it there. Mixins should be stateless

@JohannesMessner

Copy link
Copy Markdown
Member

Do not add an init to the Mixin. DO that in the init of DocArray Base class

Are we sure about this? It would assign some methods that don't exist there and break separation of concerns. It seems to be accepted practice to let mixins have inits and pass **kwargs between them: https://stackoverflow.com/a/50465583

multiple inheritance is not so easily supported in Python I think. What is ur super when u call super?

My understanding is that super() picks a certain evaluation order along which it tries to find an __init__ to call, and then it call the first it finds. But if that __init__ in turn calls super().__init__(**kwargs) then it will keep 'walking' along that evaluation order. So as long as every init in a mixin makes that call it should all work as intended. The mixins just need to be set up to not break this chain.

I agree with Johannes it would be better to add it in the Mixin __init__ but if it is not possible because of inheritance problem we can put in in the DA init

Do not put it there. Mixins should be stateless

But by moving it to the base class we don't avoid the state, we just move it to somewhere else. The only thing that changes is that the assignment happens somewhere where we don't know the thing that we assign...

But not willing to die in this hill, @anna-charlotte you can choose who to listen to ;)

@JoanFM

JoanFM commented Oct 20, 2022

Copy link
Copy Markdown
Member

Do not add an init to the Mixin. DO that in the init of DocArray Base class

Are we sure about this? It would assign some methods that don't exist there and break separation of concerns. It seems to be accepted practice to let mixins have inits and pass **kwargs between them: https://stackoverflow.com/a/50465583

multiple inheritance is not so easily supported in Python I think. What is ur super when u call super?

My understanding is that super() picks a certain evaluation order along which it tries to find an __init__ to call, and then it call the first it finds. But if that __init__ in turn calls super().__init__(**kwargs) then it will keep 'walking' along that evaluation order. So as long as every init in a mixin makes that call it should all work as intended. The mixins just need to be set up to not break this chain.

I agree with Johannes it would be better to add it in the Mixin __init__ but if it is not possible because of inheritance problem we can put in in the DA init

Do not put it there. Mixins should be stateless

But by moving it to the base class we don't avoid the state, we just move it to somewhere else. The only thing that changes is that the assignment happens somewhere where we don't know the thing that we assign...

But not willing to die in this hill, @anna-charlotte you can choose who to listen to ;)

What doesn't need state is the Mixin, not the class.

@JoanFM

JoanFM commented Oct 20, 2022

Copy link
Copy Markdown
Member

Anyaway, the error came from not passing positional args

@JohannesMessner

Copy link
Copy Markdown
Member

Do not add an init to the Mixin. DO that in the init of DocArray Base class

Are we sure about this? It would assign some methods that don't exist there and break separation of concerns. It seems to be accepted practice to let mixins have inits and pass **kwargs between them: https://stackoverflow.com/a/50465583

multiple inheritance is not so easily supported in Python I think. What is ur super when u call super?

My understanding is that super() picks a certain evaluation order along which it tries to find an __init__ to call, and then it call the first it finds. But if that __init__ in turn calls super().__init__(**kwargs) then it will keep 'walking' along that evaluation order. So as long as every init in a mixin makes that call it should all work as intended. The mixins just need to be set up to not break this chain.

I agree with Johannes it would be better to add it in the Mixin __init__ but if it is not possible because of inheritance problem we can put in in the DA init

Do not put it there. Mixins should be stateless

But by moving it to the base class we don't avoid the state, we just move it to somewhere else. The only thing that changes is that the assignment happens somewhere where we don't know the thing that we assign...
But not willing to die in this hill, @anna-charlotte you can choose who to listen to ;)

What doesn't need state is the Mixin, not the class.

I don't agree, the only 'state' we have is which implementation of form_csv we pick, and this is tied to the mixin, not anything else.

@JoanFM JoanFM marked this pull request as ready for review October 20, 2022 13:54

@JoanFM JoanFM left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, not approved, let's raise Exception

@JoanFM JoanFM marked this pull request as draft October 20, 2022 13:54
Comment thread docarray/array/mixins/io/from_gen.py Outdated
"""
warnings.warn(
'Calling from_csv() from a DocumentArray instance does not change the instance in-place. Instead of calling from_csv() from a DocumentArray instance, you should probably use `DocumentArray.from_csv(...)`.'
raise Exception(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not any Exception, we need to raise the same Exception pandas raise when trying to load from_csv from pandas.Dataframe instance

@anna-charlotte anna-charlotte Oct 20, 2022

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a from_csv in pandas, only pd.read_csv (no classmethod) or classmethods like from_dict. from_dict does not throw an exception when being called from an instance, I believe. Am I missing something?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea it that when you try to do

import pandas as pd
df = pd.DataFrame()
df.read_csv()
>>> AttributeError: 'DataFrame' object has no attribute 'read_csv'

it will fail because the df object does not have the read_csv method. We want to mimic the behavior but I am not sure if we should have exactly the same error message.

@JoanFM what about raising
AttributeError: 'from_csv can't be called from a DocumentArray instance directly but rather from the DocumentArray class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you! Changed it now.

@samsja samsja left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far ! Next step is to make it a bit more generic so that it work for the all of the from_xxx classmethod

@anna-charlotte

Copy link
Copy Markdown
Contributor Author

I might move the generalization regarding all of the from_xxx classmethods to a new PR, since it has taken me quite long now, so that this one can be merged as soon as it is ready.

@JoanFM JoanFM marked this pull request as ready for review October 20, 2022 16:55
@JoanFM JoanFM merged commit 4ad2398 into main Oct 20, 2022
@JoanFM JoanFM deleted the fix-from-csv-behaviour branch October 20, 2022 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: unintuitive behavior from csv method return a da but does not modify in place

4 participants