0

I have a pandas dataframe of 5M records and 400k groups with two columns and I am trying to unstack the rows into columns and join all the values of columns into one single column. For explanation, I took a subset of data from my dataframe

EVENT_ID     DIAGNOSIS
  24601           637
  24601          1561
  24601           360
  24601          3002
  82903          1580
  82903           923
  82903           986
  94261          1940
  94261          2353
  94261          4553

I tried to use the following code to pivot the dataframe.

df_pivot = df.pivot(index='EVENT_ID', columns='DIAGNOSIS', values ='DIAGNOSIS').add_prefix('').reset_index()

and it is giving me an error saying :

Unstacked Dataframe is too big, causing int32 overflow

I took a subset to see if it was working and it did work.

I expect my dataframe to look like

EVENT_ID  637  1561  360  3002 1580 923 986 231 1940 2353 4553  all_diagnosis
  24601    637  1561  360  3002                                  637|1561|360|3001
  82903                         1580 923 986                     1580|923|986
  94261                                          1940 2353 4553 1940|2353|4553 

Eventually I want to create a dictionary for EVENT_ID: all_echos which looks like:

{
24601 : 637|1561|360|3001
82903 : 1580|923|986
94261 : 1940|2353|4553 
}

I have the code to create the dictionary as I tried for the subset of the data and it was working.

When I try the same code for the complete data it is not working. I'd really appreciate if anyone can suggest to me how to do it for the complete data.

  • I did post the error too if I try it for the complete data "Unstacked Dataframe is too big, causing int32 overflow" – akash bachu Apr 11 at 14:34
  • 1
    Do you really need the wide format? You can build such a dictionary directly from long format. – Parfait Apr 11 at 14:36
  • @Parfait. I was assuming, to create a dictionary in that way, we first need to convert to wide format and create a dictionary. To be honest, I just want the dictionary, not the wide format. – akash bachu Apr 11 at 14:39
  • 1
    df.groupby('EVENT_ID')['DIAGNOSIS'].apply(list).to_dict() – Parfait Apr 11 at 14:44
  • @Parfait. Thanks. It's working :) – akash bachu Apr 11 at 14:53

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Browse other questions tagged or ask your own question.