Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scalar numeric ops with np.nan are inconsistent w.r.t. return type #368

Open
jreback opened this issue Sep 21, 2015 · 6 comments
Open

scalar numeric ops with np.nan are inconsistent w.r.t. return type #368

jreback opened this issue Sep 21, 2015 · 6 comments

Comments

@jreback
Copy link

jreback commented Sep 21, 2015

In [9]: nd.array([None,None],dtype='?float64') + np.nan
Out[9]: 
nd.array([NA, NA],
         type="2 * ?float64")

In [10]: nd.array([1.,2.],dtype='?float64') + np.nan
Out[10]: 
nd.array([NA, NA],
         type="2 * ?float64")

# shouldn't this be the same as [10] ? or is ``np.nan`` just not 'missing' for ``float64``?
In [11]: nd.array([1.,2.],dtype='float64') + np.nan
Out[11]: 
nd.array([nan, nan],
         type="2 * float64")
@jreback
Copy link
Author

jreback commented Sep 21, 2015

cc @izaid

@izaid
Copy link
Member

izaid commented Sep 21, 2015

Noted. Want to weigh in, @mwiebe?

@mrocklin
Copy link

I perceive 11 as the proper behavior. ?float64 should still support proper nan. Perhaps dynd needs an nd.NA singleton.

@mwiebe
Copy link
Member

mwiebe commented Sep 21, 2015

11 makes sense to me as well. The choice that NaN counts as NA even when it isn't the specific tagged NaN means that the interaction between NaN and NA is a bit tricky to get all the details right.

@jreback
Copy link
Author

jreback commented Sep 22, 2015

so let me see if I understand the model.

so internally DyND uses NA (as a repr), which is a dtype specific sentinel. When passing say a python None it gets converted.

What is the sentinel for float then? I thought it was np.nan? If so, then why are None and np.nan not on on the same par when coerced to float?

I would think this would do think this would convert to [10] (from above). and reiterate that [10],[11] should be equal from a consistency perspective.

In [2]: nd.array([1.,2.],dtype='float64') + None  
RuntimeError: could not convert python object of type <type 'NoneType'> into a dynd array

otherwise you have a much more complicated scenario where nan and None are meaning different things, which would be a bit odd. (and this would be a break from numpy).

In [3]: arr = np.array([1.,2.])

In [4]: arr
Out[4]: array([ 1.,  2.])

In [5]: arr[0] = None

In [6]: arr
Out[6]: array([ nan,   2.])

@mwiebe
Copy link
Member

mwiebe commented Sep 22, 2015

For floating point types, there are a large number of NaNs, that all get lumped together in typical processing. One particular NaN is chosen as NA, and the rest get to stay NaN. But, for compatibility with pandas and also what I recall from R's behaviour, any NaN returns false to is_avail even when it is not the specific sentinel NA. For the latter reason, I think NaNs are currently printing as NA even though they should print as nan,and other corner cases are also wrong.

It's probably worth writing out the theoretical model desired for the missing value NA/NaN interaction and some exploration of how that trips up the implementation in a design doc somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants