시계열 데이터에서의 invalid type promotion error

from sklearn.model_selection import TimeSeriesSplit

tscv = TimeSeriesSplit(n_splits=3) # 디폴트가 5개로 나눔

for train_index, test_index in tscv.split(train):
  print("TRAIN:", train_index, "Validation", test_index)

시계열 데이터를 TimeSeriesSplit으로 나누고,

pipeline을 통해 모델을 만들고, 학습을 시키는데

pipe = make_pipeline(
    OrdinalEncoder(),
    SimpleImputer(),
    RandomForestClassifier( _jobs=-1,
                           oob_score = True)
)

dists = {'randomforestclassifier__max_depth' : [a,b]
         }

clf = RandomizedSearchCV(
    pipe,
    param_distributions = dists,
    n_iter = 10,
    cv = tscv,
    scoring = 'accuracy',
    verbose = 1,
    n_jobs = -1,
)
clf.fit(X_train, y_train)

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator) 473 474 if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig): --> 475 dtype_orig = np.result_type(*dtypes_orig) 476 477 if dtype_numeric: <__array_function__ internals> in result_type(*args, **kwargs) TypeError: invalid type promotion

와 같은 에러가 나서 한참을 헤맸다.

1. TimeTestSplit에 에러가 있나?

2. RandomizedCV에 에러가 있나?

3. PipeLine을 잘못 만들었나?

하고 한참을 헤맸는데

원인은

df.date = pd.to_datetime(df.date) # 데이터타입으로 바꾸기
# dtype: datetime64[ns]

를 시행한 결과 데이터 타입 중 datetime64[ns]가 포함되었기 때문이다.

출처 github.com/matplotlib/matplotlib/issues/9577

Plotting pcolor with datetime along coordinate fails with TypeError: invalid type promotion · Issue #9577 · matplotlib/matplot

Bug report Bug summary When trying to use pcolor or pcolormesh, and one of the coordinates (X, Y) is of a datetime64 dtype, the plotting fails with the exception TypeError: invalid type promotion. ...

github.com

그래서 위 항목을 제거했더니 모델이 잘 돌아간다...!

휴......

저작자표시 비영리 변경금지 (새창열림)

'머신러닝, 딥러닝' 카테고리의 다른 글

n_jobs = -1 ? 2 ? 4? 뭘 넣지? (3)	2021.02.14
분류Classification와 회귀Regression의 차이 (0)	2021.02.13
Cross Validation vs Train/Validation/Train Set 나누기 (0)	2021.02.09
threshold, precision, recall의 의미 (0)	2021.02.08
트리 모델에서 Ordinal 대신 Nominal 인코딩을 해야 하나? (0)	2021.02.07

천천히, 그러나 꾸준히

시계열 데이터에서의 invalid type promotion error

'머신러닝, 딥러닝' 카테고리의 다른 글

티스토리툴바

시계열 데이터에서의 invalid type promotion error

'머신러닝, 딥러닝' 카테고리의 다른 글

'머신러닝, 딥러닝' Related Articles

티스토리툴바