Questions tagged [pandasql]

pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R.

1
vote
0answers
8 views

Errors writing a dataframe to DB2 using Pandas to_sql

I am trying to load data from a pandas dataframe to an IBM DB2 Data Warehouse environment. The table already exists so I am just appending rows to the table. I have built the dataframe to mirror ...
0
votes
1answer
35 views

How to remove duplicates based on text similarity across rows in Pandas

I have a dataset of news headlines. I'd like to remove duplicate or highly similar headlines based on textual similarity with headlines of the past ten days. For highly similar headlines, I want to ...
0
votes
2answers
33 views

Exit loop in python if SQL query doesn't bring any data

I am new to Python and have been given a task to download data from different Database ( MS SQl and Teradata ). The logic behind my code is as follow : 1: Code picks up data for Vendor from an excel ...
0
votes
1answer
31 views

In Python, using pandasql: query return “Empty DataFrame”

In Python, using pandasql: query return "Empty DataFrame" import pandas as pd import sqlite3 as db import pandasql dataSet = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/...
1
vote
2answers
50 views

summing all amounts by date in respect with individuals

I have this Dataframe df: payout person1 person2 date 1 300.0 LA NaN 2012-02-01 2 500.0 DO NaN 2012-02-01 3 600.0 DO NaN 2012-02-...
2
votes
2answers
35 views

Pandas: Aggregated and Group by - IDE: Pycharm

QQ - IDE: Pycharm - I am using the below Dataframe Sample format Name Business SegmentID Revenue Margin OrderQuantity James Commercial 1001 1500 100 1 ...
0
votes
0answers
19 views

use of local() in pandasql

The pandasql library uses the locals() function to execute SQL in the dataframe. def select_first_50(filename): students = pandas.read_csv(filename) students.rename(columns = lambda ...
0
votes
0answers
19 views

How to use python variables in pandasql?

I want to use Course Identity python variable in the pandasql query, but it is not working. Here is my code: Course_Identity='Mathematics' query=("""SELECT user_id, course_id, count( DISTINCT ...
0
votes
1answer
48 views

What's the equivalent for LISTAGG in Pandas?

I have the following aggregations: def my_agg(x): names = { 'first_seen': x['first_seen'].min(), 'last_seen': x['last_seen'].max(), 'md5s': x['md5s'].tolist(), } ...
0
votes
0answers
45 views

SQL query column doesn't exist

I have seen many answers about similar SQL queries, but for some reason none of the solutions works with my query. I am trying to read my query into a pandas data frame matching them to my (list) ...
0
votes
1answer
16 views

Variable used into a PSQL query

I have a problem with my code python, i'm using Pandasql, and what i want is to use my (enddate) in a query so: enddate = pd.to_datetime(datetime.today()).date() q2 = """SELECT * FROM res_q1 t1 ...
0
votes
0answers
14 views

Python. Append data to existing table with PK's and maintaining current datatypes

I managed to successfully append data to a MySQL table with 2 PK's, however, when doing so, the existing datatypes change to the ones pandas chooses. This is what I am doing: 1- definition to add PK ...
0
votes
0answers
61 views

Update table statement not working in pandasql

I have a dataset with some null values I wanted to update to read missing instead of null I've looked through all the forums and documentation and can't seem to find a response from pandasql ...
2
votes
1answer
28 views

Pandas - How to make a groupment in which a new column is the result of (sum of a column)/(number of itens grouped)?

I need to make some kind of groupment in which a new column (result) is the sum of the values column divided by number of items found? Could anyone help me, please? For example: Table A +-------+--...
0
votes
0answers
21 views

Pandas dataframe column creation using group by and where condition [duplicate]

I have table with column names : user_id, status_type Where status_type can take two distinct values delete or add only. I want to create table with columns: user_id,status_type_add_count,...
1
vote
1answer
196 views

pandasql OperationalError: too many SQL variables

I'm trying to work with pandasql but something is not working properly. I execute the following code: from pandasql import sqldf, load_meat pysqldf = lambda q: sqldf(q, globals()) meat = load_meat() ...
-1
votes
2answers
47 views

pandasql: count occurrences of pairs

I was trying to count the number of matches that A and B have ever played, the dataset looks like this: so the number of matches team1 and team 29 have played is 2 as they each once were HomeTeam and ...
1
vote
1answer
44 views

Pandasql query of a list or dict of dataframes

Given a set of files - each one of which is read into a distinct dataframe - how might a pandasql query reference them? In the following snippet we have a list of dataframes: but the same question ...
2
votes
0answers
85 views

pandas.read_sql is extremely slow on python 3 kernel compared to python 2

I have a simple parameterized select query hitting an Oracle database via pyodbc connection and fetching data in a dataframe via pandas.read_sql. The code is super efficient and fast in Python 2 ...
0
votes
1answer
25 views

overwrite and append pandas data frames on column value

I have a base dataframe df1: id name count 1 a 10 2 b 20 3 c 30 4 d 40 5 e 50 Here I have a new dataframe with updates df2: id name count 1 a ...
0
votes
0answers
16 views

cx_freeze - the exe returns undescriptive error on sqldf

Trying to create exe for my script, i am using pandasql and sqldf to create sql queries against 2 csv files for matching.after the build, when i run the .exe file, s screen with list of errors is ...
0
votes
1answer
61 views

After importing pandasql as sqldf , getting 'module' object is not callable error

I imported the below : import pandasql as sqldf import pandas as pd import numpy as np from pandasql import load_meat, load_births pysqldf = lambda q: sqldf(q, globals())** meat=load_meat() when I ...
0
votes
2answers
192 views

using pd.read_sql() to extract large data (>5 million records) from oracle database, making the sql execution very slow

Initially tried using pd.read_sql(). Then I tried using sqlalchemy, query objects but none of these methods are useful as the sql getting executed for long time and it never ends. I tried using Hints....
2
votes
0answers
82 views

Using dictionary values in pandasql

I have a dictionary that has dataframes in values, something like this - mydict = {'demand': demand_df, 'supply':supply_df, 'prod': prod_df} Then I am using pandasql module to execute a simple query....
3
votes
1answer
1k views

Pandas Merge two rows into a single row based on columns

I have 2 rows that look like these, ------------------------------ DealName | Target | Acquirer | ----------------------------- ABC-XYZ | ABC | None | ------------------------------ ABC-XYZ |...
2
votes
1answer
400 views

PandaSQL very slow

I'm currently switching from R to Python (anconda/Spyder Python 3) for data analysis purposes. In R I used to use a lot R sqldf. Since I'm good at sql queries, I didn't want to re-learn data.table ...
2
votes
0answers
393 views

Python pandasql sqldf giving “too many SQL variables” when number of values (RxC) in dataframe is over 999

Background: I'm ultimately trying to join two tables that reside in completely different databases. I've loaded the data into dataframes with the intent of executing a SQL join through sqldf (both ...
0
votes
1answer
204 views

Multiprocessing Pandas SQL

I'm getting a table not found error when I utilize multiprocessing on chunked Pandas dataframe in each processor started by the mp library. I'm using pandasql library for SQL in the following manner: ...
0
votes
2answers
164 views

Joining two tables using pandasql

I am running SQL queries in Python using Pandasql. The queries x, y and z work fine but u gives an error x = pysql("select * from flight f left join iata i on f.ORIGIN = i.IATA;") y = pysql("select *...
0
votes
1answer
129 views

Error with query in pandasql

I am very new to PandaSQL and have never used it before. Here is my code up until now: import pandas as pd from pandasql import sqldf import numpy as np tasks = pd.read_csv("C:/Users/RMahesh/...
1
vote
0answers
40 views

How to iterate over pandas dataframe row in python and create new column based on values present in row

I have a patient data frame and sample looks like this. pat_id drug date lag_date lag_grace_dte 1234 A 1-Jan-17 1-Jan-17 11-Jan-17 ...
0
votes
0answers
581 views

Pandas Group by a Column and take the non null values in different rows for each column [duplicate]

I have an interesting question for which I am not getting any built-in solution in Pandas. I have data like: A B C D E F 1 null null Red null 12 1 89 76 Red null null 1 null ...
0
votes
2answers
758 views

Group by and Sum in Pandas without losing columns

I have a Dataframe that looks like this: -------------------------------------------------------------------- |TradeGroup | Fund Name | Contribution | From | To | | A | Fund_1 ...
0
votes
1answer
26 views

covert sql query output to json in python

I want the output of the query to be consumed by charts js. They require json/array format for values and labels. I am new to python and pandas. My code - import pandasql as pdsql str="""select ...
0
votes
0answers
29 views

How In Import Hex number into a Pandas Data Frame

I am using Pandas 0.22.0, Numpy 1.14.0 with Python 3.6 and I am importing data from MS SQL, one of the fields holds an OID value of 0x537AA051BEBB01A2 when I read this in to a data frame in Pandas it ...
-1
votes
1answer
206 views

i have an error in using pandasql .my table is not getting identified

error: PandaSQLException: (sqlite3.OperationalError) no such table: aadhaar_data [SQL: 'select registrar,enrolment_agency from aadhaar_data limit 50;'] (Background on this error at: http://sqlalche....
0
votes
2answers
104 views

How do I get today's date in SQLite?

I am trying to use Pandasql to query my dataframe. However, it is giving me an empty dataframe even though I know it shouldn't. I think it is because I have used today's date incorrectly in the where ...
0
votes
0answers
84 views

How do I use today's date in the where clause when using Pandasql in Python?

I am struggling to use some SQL functions in Pandasql as it seems that the same names aren't used for the functions. For example, using Pandasql in Python I type: q = """ select e.Date, e.SITA, e....
0
votes
0answers
148 views

pandasql int too large to convert error

I have data like the example data below. When I run the code below to get the count of distinct Product ID by prod cat, I’m getting the error below. Pandas groupby with nunique doesn't seem to have ...
0
votes
0answers
85 views

Both Cython and Numba, Pandasql sqldf select statement throws sqlite3.OperationalError: no such table

I am really new to Python programming. I have a dataframe pandasql query which runs fine when I run my code with the standard Python3 implementation. However, after cythonizing it, I always get the ...
2
votes
1answer
1k views

filter pandas data on specific index

I'd like to filter a dataframe based on specifics index. I've read things about query but I don't succeed. Here is the code which create my pivot table. I'd like to filter on specific members df ...
0
votes
1answer
179 views

pandasql::sqldf not capturing looping variable

I'm trying to loop over a list with pandasql::sqldf but this sqldf does not seem to be capturing the looping variable. Below is a stylized outline of my problem: import pandas as pd from pandasql ...
-1
votes
1answer
70 views

SQL query execution using panda library

I have an SQL query like this "select (ShipMode),(count(OrderID)*100/8994) as Score from friends.sampledatapanda(I have a CSV file, so ignore this) group by 1". Which I want to execute the same using ...
0
votes
1answer
50 views

pandasql: How to select non english named column?

Dataframe has non english named columns, how to select such columns? import pandasql as pdsql pysql = lambda q: pdsql.sqldf(q, globals()) sqlquery = 'select ''Машина'', min(''Дата доставки'') from ...
1
vote
1answer
221 views

Python: Get DISTINCT column values from inside CASE function in pandasql

I am trying to write a query in Python using pandasql. My code is as below, import pandas as pd from pandasql import * data = pd.read_csv('registerlog.csv') q = """ SELECT a.RegistrationMonth, ...
0
votes
1answer
190 views

pandasql EOL error while scanning string literal

I have the code below where I'm trying to use pandasql to run a sql query with sqldf. I'm doing some division and aggregation. The query runs just fine when I run it in r with sqldf. I'm totally ...
1
vote
0answers
361 views

Accessing the columns of pivot table in Python Pandas

I'm using a python pandas pivot. How can I get access the columns of pivot on new data frame? KM_pivot_first = pd.pivot_table(read_sql_KM, values=["IMPRESSIONS","ENGAGEMENTS"],index="PLACEMENT_ID",...
1
vote
1answer
4k views

Pandas merge on two columns using date and another column

Hello trying to merge two data frames and sum visit counts by date and upc. Transaction data (date,upc,sales) 200k rows x 3 columns Visits counts(date, upc, visit count) 2 million+ rows x 3 columns ...
1
vote
2answers
2k views

Outer Join Pandas Dataframe

I am trying to outer join (on df1) two pandas dataframe. Below are the sample data frames: df1: Index Team 1 Team 2 Team1_Score Team2_Score 0 A B 25 56 1 ...
-3
votes
1answer
48 views

Need to group sequence of letters while keeping the order in pandas or/and python

I have a dataframe : row1 col1 col2 1 U 1 2 U 1 3 U 1 4 D 1 5 D 1 6 U 1 7 U 1 When I did groupby sum I got : col1 col2 1 U 5 2 D 2 ...