Questions tagged [pandas]

Pandas is a Python library for data manipulation and analysis, e.g. dataframes, multidimensional time series and cross-sectional datasets commonly found in statistics, experimental science results, econometrics, or finance. Pandas is one of the main data-science libraries in Python.

0
votes
0answers
6 views

I'm trying creating a pandas dataframe column based on a given condition(multiple condition)

I used two different codes to solve this: 1. I used if conditions inside the dataframe. 2. I tried to use the functions. I'm getting the syntaxerror: invalid syntax. I'm still a beginner using ...
0
votes
0answers
3 views

Pandas: hyper link to one of the columns in dataframe

I am new to Flask Python. So please accept my apology if my problem seems to be trivial or stupid. I am displaying a dataframe in the form of table on html page. I want to data one column into ...
0
votes
0answers
5 views

How to parse json column having multiple dicts with same key present in a pandas dataframe read from a .xlsx file?

I have read a excel file into pandas dataframe using read_excel function. One of the columns 'filter' with 'object' dtype actually has json in it. I have tried using json_normalize as: import json ...
0
votes
1answer
10 views

I have a dataframe and I want to find the standard deviation for some specific cells

I'm trying to use pandas to find the standard deviation for the entries in some specific cells I have tried using numPy's stdev like so: numpy.std(df[columnName][j:i]) I have also tried using this: ...
-1
votes
0answers
10 views

i have a dataframe that is built like this

TB_Geo TB_LOB TM1 Actual1 Actual2 Actual3 0 TB_Geo TB_LOB TM1 2018 2018 2018 1 Can IC_2 Can Jan Jan MTD Feb MTD i have a dataframe setup as follows *** i need to delete every ...
1
vote
1answer
11 views

Convert Interval Outer Join SQL in Python Pandas Dataframe

I'm converting an Oracle SQL outer interval join in Pandas Dataframe. Below is the Oracle SQL: WITH df_interval AS (SELECT '1' id, 'AAA' interval, ...
2
votes
2answers
33 views

How to select a specific category of bins in python?

I have a list of numbers which I separated into bins using pandas.cut(). How can I select one category of the bins? manhattanBedrmsPrice.head() 0 859 5 1055 9 615 11 663 13 1317 ...
1
vote
3answers
31 views

Group daily data in to months and count objects per user

I am trying to group a product count by month and user. I have daily data so first I have group it in months and then per user. See the table below: Date UserID Product 2016-02-02 1 ...
2
votes
2answers
27 views

How to combine multiple rows in a pandas dataframe which have only 1 non-null entry per column into one row?

I am using json_normalize to parse json entries of a pandas column. But, as an output I am getting a dataframe with multiple rows with each row having only one non-null entry. I want to combine all ...
0
votes
1answer
15 views

Pandas: Union of Dataframes

Considering two dataframes like the ones below: import pandas as pd df = pd.DataFrame({'id_emp' : [1,2,3,4,5], 'name_emp': ['Cristiano', 'Gaúcho', 'Fenômeno','Angelin', 'Souza']}) ...
0
votes
0answers
12 views

Memory leak when trying to regularize an uneven time series with Pandas

I am trying to regularize an uneven time series with Pandas as in this example https://megam.info/a/39730730/10005441. However, my process gets killed with exit code 137 ("(interrupted by ...
0
votes
1answer
21 views

Excel file containing both boolean and “0” and/or “1” in same column not imported correctly with read_excel

I need to import an excel sheet as is in a dataframe in pandas. When using the read_excel function with dtype=object, I still get "interpreted" values. I am using Python 3.5.4, pandas 0.23.4 in ...
0
votes
0answers
8 views

How to convert a string Serie to datetime with and without time

When converting a Series from string to datetime using astype function, if the string doesn't have the time component, it assumes different behaviours when converting and swaps month and day in the ...
0
votes
0answers
15 views

Panda df replace value in column with dict and regex

I have a dict which is as follows: MyDict = { 'Type': 'D', 'Tariff': 'T2', 'Profile' : 1, 'QuoteType' : 'Firm' } I have a df which looks like this My goal is first to replace the @Type with ...
-1
votes
0answers
29 views

How to read every excel file in for loop?

I have read this but unable to solve. My question is that I kept every file in the for loop but I have understood that it is taking the last file excel format. for example: m=['paketone4000.dump.xlsx'...
0
votes
1answer
11 views

Python/Pandas - Preparing Source Data with Weekly Columns to Time Series

I tried to google a question like this: How to transform weekly data for time series analysis in Pandas? This question is hard to search without results that talk straight about re-sampling data from ...
0
votes
2answers
17 views

How do I match similar names to a given row if they appear in one year and not the next and appear again?

Actual Question (couldn't add to title because it's too long): I have facility names in a list of list, where each list is for a corresponding year. I want to create a data frame, with each row ...
0
votes
1answer
16 views

How do you change the columns when using Panda's crosstab? For example, I don't want the days of the week to be in alphabetical order

Pretty simply, I want to change the order of the columns for Panda's crosstab. Right now, it's in alphabetical order e.g. Friday, Monday, Saturday, Sunday, Thursday, Tuesday, Wednesday. I would like ...
1
vote
1answer
33 views

how to create a dictionary with one index key column and multiple value columns

I have a dataframe df with 3 columns A,B,C. I want column A to be the index and key and columns B and C as A's values. I have tried the below: def cellDict(): df_set_index('A')['B','C'] x= ...
0
votes
0answers
8 views

Invocation exception in excecuting graphviz

i tried to plot a graph of decision tree in jupyter notebook and everytime i executed got an error. itried to fix it by reinstalling anaconda and pydotplus using conda, nd also installing graphviz ...
-1
votes
3answers
27 views

Pandas: How return all rows if column string contains at least a certain number of strings from a list?

Say that I have a list of strings, such as listStrings = [ 'cat', 'bat', 'hat', 'dad', 'look', 'ball', 'hero', 'up'] Is there a way would return all rows if a particular column contains 3 or more ...
0
votes
2answers
16 views

Subset df using pandas filter and datetime functions

I'm trying to filter the following df: datetemp | gamenum |score 2019-6-2 123 2 2019-4-5 314 4 2019-5-11 344 2 2019-4-29 324 1 2019-2-28 325 9 2019-1-30 231 ...
1
vote
1answer
19 views

aggfunc to get an arbitrary value in cells of a pandas pivot_table

I'd like to use pivot_table to show an arbitrary value of a column in each cell. For example, given a DataFrame like this: df = pd.DataFrame({'x': ['x1', 'x1', 'x2'], 'y': ['a', 'b'...
1
vote
0answers
22 views

How do I best store and call upon a pandas script that operates on a single XLSX file?

I have a Python script that makes use of Pandas to read in an excel XLSX file to a DateFrame, perform some calculations & grouping, and generate a new DataFrame with the information I desire. All ...
2
votes
0answers
27 views

Pandas: Difference between dot and [] [duplicate]

I thought with pandas [] and dot have the same meaning and can be used either or. However, that's not the case. Can you tell me the difference? def load_event_data(): df = pd.read_csv('data.csv', ...
0
votes
0answers
13 views

having problemns while using dask map_partitions with string matching algorithm

I'm having some probems apllying a text search algorithm with parallelized dask insfrastructure. I'm tryng to find the best match for 40,000 stirngs in a series object against a 4000 string list. I ...
1
vote
1answer
28 views

Pandas accumulate data for linear regression

I try to adjust my data so total_gross per day is accumulated. E.g. `Created` `total_gross` `total_gross_accumulated` Day 1 100 100 Day 2 100 200 Day 3 100 ...
0
votes
0answers
17 views

Pandas code is running into memory error with lots of available ram

sklearn fit, MinMax scaler etc run into memory error when I am running the jupyter notebook on ec2-instance of m5a.xlarge (16gb ram). Whereas when I run the same code on my local machine (Macbook air, ...
0
votes
0answers
18 views

Compare master and clild Dataframe and add new rows to master base on two column values

I have two Dataframes as: Master_DF: Symbol,Strike_Price,C_BidPrice,Pecentage,Margin_Req,Underlay,C_LTP,LotSize JETAIRWAYS,110.0,1.25,26.0,105308.9,81.05,1.2,2200 JETAIRWAYS,120.0,1.0,32.0,96156.9,...
1
vote
2answers
20 views

Pandas access first column with duplicate column names

Looking for some help accessing the first empty df column that is also a duplicate name, by name. Consider this dataframe import pandas as pd df = pd.DataFrame(columns=['A', 'B', 'C', 'C', 'C', 'C', ...
0
votes
1answer
30 views

Python issue with apostrophe when using replacements function

I'm trying to replace a string that has more than one apostrophe but it's messing up the string being executed properly. The first replacement works fine, but the second needs to replace this: "{u'...
-1
votes
0answers
28 views

How to fix cannot reshape array of size 146231960 into shape of (4341)

I am trying to read a matrix with dimensions 33694 by 4341. where I am setting the input_shape =4341. This line in my code gives the ValueError: cannot reshape array of size 146231960 into shape (...
0
votes
2answers
26 views

Pandas: How to read CSV file from google drive public?

I searched similar questions about reading csv from URL but I could not find a way to read csv file from google drive csv file. My attempt: import pandas as pd url = 'https://drive.google.com/file/...
0
votes
2answers
25 views

CUMSUM addition as below

i have to calculate the cumsum addition in the below. A should be blank. B should be as it, c should 31 + 30 = 61, previous item and addition of present item, D = 61 + 31 = 92 and so on. data: ...
0
votes
0answers
20 views

Replace for loop with dataframe.apply()

Objective is to replace a for loop with dataframe.apply(). Below is the code for the for loop: #ma is moving average of a number of days say 100 #days is the number of days for which stock data is ...
-1
votes
1answer
34 views

Converting string to datetime with format

I have a column in my dataframe that looks like this, but the current dtype is object. This dataframe is imported from a csv where there were no column heads and it didn't recognize the type when it ...
0
votes
0answers
21 views

Dataframe split a column with arrays into multiple rows [duplicate]

I have a dataframe as follows: imagename date seqid locid image1.jpg 16-05-2019 19:08:16 [7, 23, 29] vp1 image2.jpg 16-05-2019 19:08:17 [15, 23, 48,3798] vp1 ...
1
vote
1answer
29 views

How to append a Modin pandas dataframe to other?

I am working on performing calculations on large files around 6GB each file and came across Modin pandas which I heard optimized compared to pandas. I need to read a CSV file in chunks and perform ...
0
votes
0answers
15 views

Getting data from dataframe where column value equals x - SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame [duplicate]

New to Python and dataframes. My issue stems from my general lack of understanding of how to access data in dataframes correctly in different situations. I hope you can and will help me. I have a ...
1
vote
3answers
45 views

using isin() for a column that has list values

I have two dataframes. Dataframe A has a column that consists of list values of ids (named items). Dataframe B has a column of int values of ids (named id). Dataframe A: date | items 2019-...
0
votes
0answers
23 views

How to drop several rows with specific values in columns

I have a dataframe with ' ?' instead of NaNs where are values are missing. How can I remove all rows where ' ?' do show up? To be more specific there are only three columns with question marks in them....
0
votes
1answer
10 views

How to fix “XLRDError: ZIP file contents not a known type of workbook”

i have written this code and it says an error like this help me to solve this problem i have already installed xlrd and again using pip install xlrd import numpy as np import pandas as pd import ...
0
votes
0answers
7 views

Creating 2 data frames from 2 different excel sheets.I want to pick Col1 from excel1 & Col2 from excel2.Store these cols in a New excel sheet

Expected ResultActual Result3Firstly Both the columns are different with different data. Gathering these columns data from 2 different sheets and adding those collected columns in a new excel sheet. ...
-2
votes
0answers
12 views

how to create noisy lables (change correct lables to wrong ) for a portion dataset?

I have a dataset for 14 different diseases, so I need to know the minimum threshold for the dataset to be verified by doctors for training model.
0
votes
1answer
43 views

Python - Combining Like + '%' to merge two Pandas dataframe

I've the following Pandas dataframes with following schemas: df_1: id identifier Input data here: id identifier 1 SQL 2 JAVA 3 C# df_2: id string_resume string_long Input ...
-1
votes
3answers
37 views

How to create a nested json from a json and replace the different key values with single key

I have a json like this (this is formed by converting a pandas data frame to json): "columns0": { "0": 9100, "4": 8550, "9": 0, "11": 1.5, "12": 35000, "13": 0 }, "columns1": {...
0
votes
1answer
17 views

Incorrect marker sizes with Seaborn relplot and scatterplot relative to legend

I'm trying to understand how to get the legend examples to align with the dots plotted using Seaborn's relplot in a Jupyter notebook. I have a size (float64) column in my pandas DataFrame df: sns....
1
vote
0answers
12 views

What is OSError: [Errno 95] Operation not supported for pandas to_csv on colab?

My input is: test=pd.read_csv("/gdrive/My Drive/data-kaggle/sample_submission.csv") test.head() It ran as expected. But, for test.to_csv('submitV1.csv', header=False) The full error message that ...
0
votes
1answer
20 views

I need help creating a graph in python with matplotlib. Data must be read from JSON

I have a JSON file like this: {"rpm": [ {"Clock": "09:55:44", "Value": 767.0}, {"Clock": "09:55:45", "Value": 759.0}, {"Clock": "09:55:47", "Value": 2302.0}, {"Clock": "09:55:48", "Value": 1973.0},...
0
votes
1answer
11 views

Remove pandas dataframe from sql_alchemy database

I'm pretty new to databases. How can we remove a pandas dataframe from a sqlite database using flask_sqlalchmey? I added a dataframe to the database using df.to_sql. But how would I remove it? from ...