scalar numeric ops with np.nan are inconsistent w.r.t. return type #368

jreback · 2015-09-21T00:48:05Z

In [9]: nd.array([None,None],dtype='?float64') + np.nan
Out[9]: 
nd.array([NA, NA],
         type="2 * ?float64")

In [10]: nd.array([1.,2.],dtype='?float64') + np.nan
Out[10]: 
nd.array([NA, NA],
         type="2 * ?float64")

# shouldn't this be the same as [10] ? or is ``np.nan`` just not 'missing' for ``float64``?
In [11]: nd.array([1.,2.],dtype='float64') + np.nan
Out[11]: 
nd.array([nan, nan],
         type="2 * float64")

The text was updated successfully, but these errors were encountered:

jreback · 2015-09-21T00:48:12Z

cc @izaid

izaid · 2015-09-21T00:52:23Z

Noted. Want to weigh in, @mwiebe?

mrocklin · 2015-09-21T15:24:32Z

I perceive 11 as the proper behavior. ?float64 should still support proper nan. Perhaps dynd needs an nd.NA singleton.

mwiebe · 2015-09-21T23:10:11Z

11 makes sense to me as well. The choice that NaN counts as NA even when it isn't the specific tagged NaN means that the interaction between NaN and NA is a bit tricky to get all the details right.

jreback · 2015-09-22T11:22:10Z

so let me see if I understand the model.

so internally DyND uses NA (as a repr), which is a dtype specific sentinel. When passing say a python None it gets converted.

What is the sentinel for float then? I thought it was np.nan? If so, then why are None and np.nan not on on the same par when coerced to float?

I would think this would do think this would convert to [10] (from above). and reiterate that [10],[11] should be equal from a consistency perspective.

In [2]: nd.array([1.,2.],dtype='float64') + None  
RuntimeError: could not convert python object of type <type 'NoneType'> into a dynd array

otherwise you have a much more complicated scenario where nan and None are meaning different things, which would be a bit odd. (and this would be a break from numpy).

In [3]: arr = np.array([1.,2.])

In [4]: arr
Out[4]: array([ 1.,  2.])

In [5]: arr[0] = None

In [6]: arr
Out[6]: array([ nan,   2.])

mwiebe · 2015-09-22T17:46:32Z

For floating point types, there are a large number of NaNs, that all get lumped together in typical processing. One particular NaN is chosen as NA, and the rest get to stay NaN. But, for compatibility with pandas and also what I recall from R's behaviour, any NaN returns false to is_avail even when it is not the specific sentinel NA. For the latter reason, I think NaNs are currently printing as NA even though they should print as nan,and other corner cases are also wrong.

It's probably worth writing out the theoretical model desired for the missing value NA/NaN interaction and some exploration of how that trips up the implementation in a design doc somewhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scalar numeric ops with np.nan are inconsistent w.r.t. return type #368

scalar numeric ops with np.nan are inconsistent w.r.t. return type #368

jreback commented Sep 21, 2015

jreback commented Sep 21, 2015

izaid commented Sep 21, 2015

mrocklin commented Sep 21, 2015

mwiebe commented Sep 21, 2015

jreback commented Sep 22, 2015

mwiebe commented Sep 22, 2015

scalar numeric ops with np.nan are inconsistent w.r.t. return type #368

scalar numeric ops with np.nan are inconsistent w.r.t. return type #368

Comments

jreback commented Sep 21, 2015

jreback commented Sep 21, 2015

izaid commented Sep 21, 2015

mrocklin commented Sep 21, 2015

mwiebe commented Sep 21, 2015

jreback commented Sep 22, 2015

mwiebe commented Sep 22, 2015