Questions tagged [dataframe]

A data frame is a tabular data structure. Usually, it contains data where rows are observations and columns are variables of various types. While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

0
votes
1answer
19 views

How do I multiply a row value in a specific column in a dataframe with its own lagged value efficiently

I have a dataframe with two columns 'actp' and 'modr': 'actp' contains an actual price series, 'modr' contains forecasted returns for the series. I want to create a third column 'modp' which takes the ...
0
votes
1answer
20 views

Insert values from an existing column when using mutate and case_when

I want to add a z column to the ds dataframe with the corresponding "y" value, when the condition in the x column is met. library(tidyverse) ds <- tibble(x = 1:5,y = 6:10) ds%>% mutate( ...
0
votes
0answers
13 views

Subset one special day over a period of time [duplicate]

I have a dataframe that contains daily data over 50 years. I want to subset the data of every May 1st of each year. I already converted the date like this: Temp[,1] = as.Date(Temp[,1],format="%Y-%m-...
0
votes
3answers
17 views

How to add a suffix to multiples variables without considering NA?

For this table is necessary add for every variable _T without considering NA. T1: var1 var2 var3 Argentina Italy NA Mexico Chile NA France Hungary NA Spain UK ...
0
votes
1answer
17 views

How do you delete only the duplicates that fulfill another condition in R?

I want to clean up this data-set.Example Table It contains many duplicates. I want to delete only the duplicates from the UUID column that have the highest value in the column Shape_Area. A loop must ...
0
votes
0answers
22 views

R - Count occurences of a value between pairs of other values in a vector

I have a dataframe like below: col1 001 x x 002 001 002 x 003 004 x x 003 x 004 x x 005 005 x I would like to add the second column containing boolean values whether "x" is located ...
-1
votes
1answer
25 views

How to create a new column with the continent names of each country?

I have a list of countries in a column and I need to get their continent in a new column. My list has 180 different countries country Switzerland France Denmark China Argentina and I need ...
4
votes
1answer
59 views

Pandas Interview Question - Compare Pandas-Joins and Ideally Provide the Fastest Method

While ago I was interviewed for a Data Scientist role. Strangely, without asking about Machine Learning or Data Science or even Statistics, I was given a small task to join two pandas dataframes, and ...
0
votes
1answer
36 views

Local variable get changed unintentionally in Python

I have a pandas dataframe "df" on which I apply several functions. I do not want to change the values of the original dataframe. All my functions look like this: def func(x): # do some stuff with x ...
0
votes
1answer
18 views

Previewing large datasets in the front end of a web app (Angular)

I'm building an Angular 7 app where the user is able to upload big data-sets around 10.0000x10.0000 or even more. At some step after uploading the files, user should be able to make a preview of the ...
0
votes
0answers
13 views

How to add or attache a specific string to the columns of dataframe?

I have a dataframe with many columns as follows: my_data: X,Y,Z,A,BS,D,..., Tf Now, I want to a specific string e.g., 'new' or a number from 1 to n to the columns. SO my desired dataframe ...
1
vote
2answers
28 views

find equal rows between data frames, including NA as a value

I have two data frames: df = structure(list(x = c(NA, NA, "b", "b", "b"), y = c("f", "f", "f", "g", "g")), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame")) df2 = structure(list(x =...
0
votes
0answers
7 views

Select fields for table output

I am fairly new to Shiny.I have created 3 tabpanels and the filters I use in the first tab need to apply to the next to tabs. That is done. However it seems like I need to include all fields used in ...
1
vote
1answer
28 views

Remove Duplicates from Col X based on condition in Col Y

I have a data frame in R, that has duplicates, in one of the columns, however I only want to remove the duplicate based on a specification in another column. For Example: DF: X J Y 1 2 ...
1
vote
1answer
32 views

Filling missing values in rows using Apache spark

I have a specific requirement to fill all Values (categories) against a column. For example, as shown in the below table. I want a way to fill the 'UNSEEN' and 'ASSIGNED' category for code HL_14108. ...
0
votes
1answer
39 views

JSON to Python Dataframe convert [duplicate]

i need help~ There is a json data (sample below). I am trying to create a Dataframe using Python. JSON: and i want save to excel. thank you! data = {'2': {'groupNo': '29', 'korea': '1', ...
0
votes
1answer
14 views

'lines' and 'lines+markers' mode is not working in plotly python

I am trying to plot a time series plot with plotly's 'lines+markers' mode although I was successful in plotting both lines and markers with some columns , I am unsuccessful for few columns and on ...
2
votes
2answers
44 views

count a column by a time period in pandas dataframe

I would like to get a count for a columns by a time period in pandas dataframe. my table: id1 date_time adress a_size reom 2005-8-20 22:51:10 75157.5413 ...
0
votes
2answers
21 views

How to calculate a new column based on other columns using a lookup approach in R?

I am trying to calculate another column in a dataframe based on another columns and a lookup table. I have a simple example that only shows few data (my real dataset contains millions of rows). I ...
0
votes
0answers
19 views

Looping over multiple dataframes to calculate required task (Equal-weighted portfolio sharpe ratio)

I am wanting some help in looping over multiple Dataframes. The code below generates the required output for the 10 industry portfolio. I could simply change the variable of dfxsind to dfxsmom to ...
4
votes
1answer
41 views

Add values to existing rows -DataFrame

I'm appending some weather data (from json- dict) - in Japanese to DataFrame. I would like to have something like this 天気 風 0 状態: Clouds 風速: 2.1m 1 NaN 向き: ...
0
votes
3answers
36 views

do calculations for multiple columns with some conditions in pandas dataframe

My question is relevant to my previous question. But it is different. So, I created a new post even though the data is same. I would like to do some calculations for multiple columns with some ...
0
votes
1answer
22 views

R: Assign a value to a column inside a loop over rows of a dataframe

I am looping over the rows of a dataframe myDF$myCol <-NA for(k in 1:nrow(myDF)){ ................... myDF[k,][myCol] <- x } but this gives me Error in `[<-.data.frame`(`*tmp*`,...
0
votes
1answer
24 views

R: index to a draframe row by index and column name gives all levels

I am using foor loop to get cell values of a dataframe for(k in 1:nrow(myDF)){ for(h in names(myDF[k,])){ a<-myDF[k,][["colname"]] print(a) } } It gives the ...
0
votes
1answer
29 views

Spark filter not working as expected.. 'Column' object is not callable

When using the "and" clause in filter on Spark Dataframe, it returns Spark.SQL.Column instead of Spark Dataframe. But for one condition it works fine. How to show() or iterate through Spark Sql ...
0
votes
0answers
16 views

How do I split a text columns rows base on a delimiter or phrase and keep the name on each split?

I have a movies review dataframe with two columns: Name of Movie and Review. I want to split the review into sentences based on delimiter .,?! or a phrase. I then want to take that split review and ...
0
votes
0answers
21 views

within one query, do some calculations between multiple columns in the “group by” query results in pandas dataframe

I would like to do some calculations for multiple columns in the "group by" query results in pandas dataframe. my table: (the actual rows can be 30k, all have the same date but different time) id1 ...
3
votes
1answer
23 views

Inserting missing rows with imputed values in Python

Problem How can you insert rows for missing YEARS, with imputed annual SALES. Progress The following code computes the sales differences. However, it is for one year, using the explicit iloc ...
1
vote
2answers
31 views

conditionally merge cells' contents in a column

Looking for a pandanic way to turn the following df: name desc 0 A a 1 NaN aa 2 NaN aaa 3 B b 4 NaN bb into: name desc 0 A a aa ...
0
votes
1answer
17 views

pandas groupby when group keys are to be treated separately if key changed between them

I believe the example of input and output will give the best explanation. But in words - I have data I want to group by user and cluster, and extract min and max timestamp in a group and count the ...
1
vote
1answer
29 views

R: Loop through columns, select value from a column and write it to a new column in same row

I have dataframe in following format id var1 val1 status1 var2 val2 status2 var3 val3 status3 123 a 12 false b 23 true c 34 true Here I want to go though each column of ...
-3
votes
0answers
27 views

Detecting cheaters [on hold]

Given a DataFrame with roll numbers of students and their answer to a question as 2 columns DataFrame df. I want to find the cheaters in the form of answer as column and the corresponding roll num. ...
1
vote
1answer
16 views

How to Create Values based on Start-Stop Info in Separate Column

I have a very messy dataset created by a research device. This data shows a physiological measure ("Physio") for every few milliseconds ("Time"). The output lists several user messages, such as when ...
2
votes
1answer
37 views

Grouping & aggregating large dataset by multiple columns

I'm trying to group my data by multiple columns and then aggregate values in other columns. While I've found numerous examples of this online, I'm running into issues when I attempt to apply the same ...
0
votes
1answer
33 views

R-code: adding a column (of a mean score) to an existing data.frame does not return the correct numbers

I am trying to clean a data set so I can analyse it with ease. I have a data set that looks like this: z a b c d a_1 b_2 c_3 d_4 ab_1 ab_2 Participant1 A 1 3 4 ...
1
vote
0answers
11 views

Why does lm generate NA for each independent variable?

I tried to make a linear regression with the lm function, but the output is NA for every independent variable. The dataframe is numeric. I have already tried to change the independent variable and ...
0
votes
0answers
34 views

Delete Rows in Pandas Dataframe by Condition

I have a pandas dataframe with states and counties but some of the counties are just the state as a whole. I'm trying to iterate through line by line to delete rows in which the df['STNAME'] == df['...
1
vote
1answer
21 views

Error to convert a factor value into a numeric value

I have a data.frame with a column called weight in a factor format, and I have these values: Weight 8.248 5.365333333333333 5.725333333333337 and I need to convert this to a numeric value, with the ...
1
vote
3answers
24 views

How to replace unique values with index number using mutate function?

I would like to replace unique values with an index number using dplyr::mutate. I am grouping by a couple of different variables to access the appropriate subset of my dataframe. head(df) ...
2
votes
2answers
35 views

How to filter rows in dataframe by biggest date time?

I'm trying to filter the row in a data frame that has different dates for the different fruits, I want to only get the row with the newest date for each fruit. I'm doing it in python 3. import ...
1
vote
3answers
24 views

How to calculate percent differences in a table in R

I have a csv file where rows 1-5 represent one state, 5-10 another, etc... I also have a column with years 1970,1980,..,2010 repeated for each state. In R (although I'm not opposed to a solution in ...
1
vote
1answer
18 views

Trouble using dplyr::order to rank values from smallest to largest including positive integers smaller than 1

I'm want to rank the euclid_dist of combinations, grouped by pitch_2 in my dataframe from smallest to largestg. My dataframe has over 80million combinations a bunch of different pitch_2s which is my I'...
0
votes
2answers
23 views

Unexpected character when writting to Excel using Pandas

I have a dictionary like this: film = { 'ID': [], 'Name': [], 'Run Time': [], 'Genre': [], 'link': [], 'name 2': [] } Then I populate it in a for loop, like this: film['ID']....
1
vote
1answer
42 views

Pandas: increase speed of rolling window (apply a function)

I'm using this code to apply a function on my data-frame using rolling window. The main issue is the size of this data-frame (data) is very large, and I'm searching for a faster way to do this. ...
0
votes
0answers
11 views

check the csv if empty & go to next steps in Pandas

I have a python code to read through the csv's and write to a HDF5 output file. I'm looking for a code enhancement where if one of the "CS csv"has no data, the code should skip that file & read ...
1
vote
0answers
16 views

Make a column with duplicated values into multiple dataframes [duplicate]

I have a data frame that is called names as follows Name,no1,no2 john,12,14 john,23,24 tom,24,26 tom,25,27 pat,15,16 pat,16,17 What I want to do is to have 3 data frames for john, tom and pat ...
0
votes
1answer
27 views

How to create new DataFrame columns with extracted data from existing columns

Hi guys I have the following DataFrame: Index Numbering Description 1 A Agri. and Forest 2 1 Agri. 3 1.1 ...
1
vote
1answer
36 views

How to match value between column in a dataframe

I would like to get the matches from one column with the other columns in a dataframe. The attribute column is a list. Below is an example: date tableNameFrom tableNameJoin ...
2
votes
3answers
36 views

Dataframe.lookup and map combination resulting in column label error

I have a large dataframe of around (1200, 10) of mostly string where I have to append a new column say 'Z' based an existing reference column say 'Y', whose values are 'A', 'B', 'C', or unknown (NaN ...
0
votes
0answers
32 views

Why repartition is not take effect in huge pyspark dataframe?

I have 10 nodes with 32 cores and 125 g each. I also have a dataframe called oldEmployee with two column employeName and its salary. df = .. oldEmployee = df.rdd.map(lambda item:....) ...