1.Python(pandas)index查询不同索引
2.ValueError: buffer source array is read-only
Python(pandas)index查询不同索引
数据存储在普通的列中同样可以进行数据查询,以下是index的用途总结:
1. 更便捷的数据查询;
2. 使用index可以获得性能提升;
3. 自动的数据对齐功能;
4. 更多更强大的数据结构支持。
以下是查看源码insight一个使用index查询数据的示例:
python
import pandas as pd
df = pd.read_excel(r"E:\Python-file\进阶\pandas\资料\**评价.xlsx")
print(df.head()) # 列
print(df.count())
python
# 设置"MOVIE_ID"为索引列,保留该列在column中
df.set_index("MOVIE_ID", inplace=True, drop=False)
print(df.head())
print(df.index)
# 使用"MOVIE_ID"的condition查询方法:查询"MOVIE_ID"是"",它的flask_login源码信息是多少
print(df.loc[df["MOVIE_ID"] == ].head())
# 使用index的查询方法:查询"MOVIE_ID"是"",它的视酷源码配置信息是多少
print(df.loc[].head())
使用index会提升查询性能:
1. 如果index是唯一的,Pandas会使用哈希表优化,查询性能为O(1):好;
2. 如果index不是唯一的,但是有序,Pandas会使用二分查找算法,查询性能为O(logN):好;
3. 如果index是完全随机的,那么每次查询都要扫描全表,查询性能为O(N):差。海弟授权源码
以下是一个性能测试的示例:
python
import sklearn.utils as shuffle
df_shuffle = shuffle(df)
print(df_shuffle.index.is_monotonic_increasing) # False
print(df_shuffle.index.is_unique) # True
print(df_shuffle.loc[])
python
df_sorted = df_shuffle.sort_index()
print(df_sorted.head())
print(df_sorted.index.is_monotonic_increasing) # True
print(df_sorted.index.is_unique) # True
使用index能自动对齐数据,包括series和dataframe:
python
s1 = pd.Series([1,弹窗注入器源码 2, 3], index=list("abc"))
s2 = pd.Series([2, 3, 4], index=list("bcd"))
print(s1 + s2)
使用index更多更强大的数据结构支持:
1. Categoricallndex:基于分类数据的Index,提升性能;
2. Multilndex:多维索引,用于groupby多维聚合后结果等;
3. Datetimelndex:时间类型索引,强大的日期和时间的方法支持。
ValueError: buffer source array is read-only
è°ç¨scikit-learnçéæºæ£®ææ¥å£æ¶ï¼æ¨¡åé¢æµè¯å¥æ§è¡æ¶ï¼éå°æ¥éValueError: buffer source array is read-only
解å³æ¹æ³ï¼
æ ¹æ®æ¥éæ示ï¼å¯è½æ¯cpythonç¸å ³æ¥éãåègithubçä¸äº æ¥é讨论 ãè¿æ è¿ä¸ª ï¼å¾1ã
æ£æ¥pandaså®è£ çå
æ¬æ¥æ¾ç¤ºçCythonæ¯Noneçï¼æ以è¯çå®è£ ä¸ä¸cythonï¼åèå®æ¹ææ¡£ï¼ è±æ ã ä¸æ ï¼
å®è£ 好åï¼å¨è¿è¡ççjupyter notebookä¸æ¯ç´æ¥å¯ä»¥çå°cythonççæ¬çï¼è§å¾2.ä½æ¯ï¼éè¦éå¯jupyter notebookï¼å¦æä¸éå¯jupyter notebookçè¯æ¯æ æ³çæçï¼èªå·±å°±å¨è¿ä¸ç¹ä¸è¢«åäºä¸ä¸ªå°æ¶ï¼ä¸ç´ä»¥ä¸ºæ¯èªå·±çæ°æ®æ ¼å¼æè 大å°çé®é¢ã
å ·ä½æ¥éï¼
---------------------------------------------------------------------------ValueErrorTraceback (most recent call last)<ipython-input--effd>in<module>----> 1 y_pred_rt=pipeline.predict_proba(nd_X_test)[:,1] 2fpr_rt_lm,tpr_rt_lm,_=roc_curve(nd_y_test,y_pred_rt)~/.local/lib/python3.6/site-packages/sklearn/utils/metaestimators.pyin<lambda>(*args, **kwargs) # lambda, but not partial, allows help() to work with update_wrapper--> out=lambda*args,**kwargs:self.fn(obj,*args,**kwargs) # update the docstring of the returned function update_wrapper(out,self.fn)~/.local/lib/python3.6/site-packages/sklearn/pipeline.pyinpredict_proba(self, X) Xt=X for_,name,transforminself._iter(with_final=False):--> Xt=transform.transform(Xt) returnself.steps[-1][-1].predict_proba(Xt) ~/.local/lib/python3.6/site-packages/sklearn/ensemble/_forest.pyintransform(self, X) """ check_is_fitted(self)-> returnself.one_hot_encoder_.transform(self.apply(X))~/.local/lib/python3.6/site-packages/sklearn/ensemble/_forest.pyinapply(self, X) **_joblib_parallel_args(prefer="threads"))( delayed(tree.apply)(X,check_input=False)--> for tree in self.estimators_) returnnp.array(results).T~/.local/lib/python3.6/site-packages/joblib/parallel.pyin__call__(self, iterable) # remaining jobs. self._iterating=False-> ifself.dispatch_one_batch(iterator): self._iterating=self._original_iteratorisnotNone ~/.local/lib/python3.6/site-packages/joblib/parallel.pyindispatch_one_batch(self, iterator) returnFalse else:--> self._dispatch(tasks) returnTrue ~/.local/lib/python3.6/site-packages/joblib/parallel.pyin_dispatch(self, batch) withself._lock: job_idx=len(self._jobs)--> job=self._backend.apply_async(batch,callback=cb) # A job can complete so quickly than its callback is # called before we get here, causing self._jobs to~/.local/lib/python3.6/site-packages/joblib/_parallel_backends.pyinapply_async(self, func, callback) defapply_async(self,func,callback=None): """Schedule a func to be run"""--> result=ImmediateResult(func) ifcallback: callback(result)~/.local/lib/python3.6/site-packages/joblib/_parallel_backends.pyin__init__(self, batch) # Don't delay the application, to avoid keeping the input # arguments in memory--> self.results=batch() defget(self):~/.local/lib/python3.6/site-packages/joblib/parallel.pyin__call__(self) withparallel_backend(self._backend,n_jobs=self._n_jobs): return [func(*args, **kwargs)--> for func, args, kwargs in self.items] def__len__(self):~/.local/lib/python3.6/site-packages/joblib/parallel.pyin<listcomp>(.0) withparallel_backend(self._backend,n_jobs=self._n_jobs): return [func(*args, **kwargs)--> for func, args, kwargs in self.items] def__len__(self):~/.local/lib/python3.6/site-packages/sklearn/tree/_classes.pyinapply(self, X, check_input) check_is_fitted(self) X=self._validate_X_predict(X,check_input)--> returnself.tree_.apply(X) defdecision_path(self,X,check_input=True):sklearn/tree/_tree.pyxinsklearn.tree._tree.Tree.apply()sklearn/tree/_tree.pyxinsklearn.tree._tree.Tree.apply()sklearn/tree/_tree.pyxinsklearn.tree._tree.Tree._apply_dense()~/.local/lib/python3.6/site-packages/sklearn/tree/_tree.cpython-m-x_-linux-gnu.soinView.MemoryView.memoryview_cwrapper()~/.local/lib/python3.6/site-packages/sklearn/tree/_tree.cpython-m-x_-linux-gnu.soinView.MemoryView.memoryview.__cinit__()ValueError: buffer source array is read-only
å ·ä½çæ¥éæªå¾ï¼