Questions tagged [hive]

Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible distributed file system. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL.

0
votes
0answers
12 views

Bucketing Table and Non Bucket Table joining data performance

I came across to one scenario today, suppose i have 3 tables(T1, T2, T3) in which T1(suppose 23 buckets),T3(suppose 12 buckets) are bucketize. We have joining operation between T1 & T3, T2 & ...
-2
votes
0answers
19 views

I want to use data only for spark then which file format is best for hive?

I want to access table only for spark then in which format i should store table data in hive ? (orc or parquet or avro or csv or text or sequence) ?? why ?? I am not going to use this table any where ...
-1
votes
0answers
21 views

create Hive external table using parquet file schema

How to create hive external table automatically by reading parquet file schema ? Will there be any data type miss match while creating ? Have loaded the parquet file to dataframe and extracted schema ...
0
votes
0answers
19 views

Does standalone metastore 3.0 need Hadoop?

I was just trying to setup Standalone Metastore 3.0, however seems like it requires also Hadoop. My understanding was that the whole point for standalone metastore that is just a service which doesn't ...
0
votes
0answers
10 views

column deletion in HIVE without code change?

QUESTION: We have a table of 4 columns. For suppose in future, if we don't require 2 columns among them, how can we remove them without doing any code changes in real time project?
0
votes
1answer
22 views

Calculating Rolling Weekly Spend in Hive using Window Functions

I need to develop a distribution of customer week long spend. Every time a customer makes a purchase, I want to know how much they've spent with us in the past week. I would like to do this with my ...
0
votes
1answer
16 views

Delete partition with non-constant value in Hive

I want to delete a partition in Hive with its value being in another table or being created by a function on-the-fly. For example: ALTER TABLE table_1 DROP IF EXISTS PARTITION (dt = ...
0
votes
0answers
25 views

Getting the Word Count after splitting an URL into an array of words

I have a table with a list of URLs url http://03cubsml.baseball.cbssports.com/stats/stats-main?selectedplayer=2122997 http://08flb.baseball.cbssports.com/scoring/standard http://100-poems.com/poems/...
1
vote
0answers
20 views

How to speed up this query to retrieve lastUpdateTime of all hive tables?

I have created a bash script (GitHub Link) to query for all hive databases; query each table within them and parse the lastUpdateTime of those tables and extract them to a csv with columns "tablename,...
0
votes
0answers
23 views

Copying Date ranges from different columns of different source tables to columns of target table in Hive

I have a requirement to copy date ranges from 2 source hive tables into a target hive table. Input table1 with sample data shown below: Table1 p_id fin_period sn_period 12345 MAR-19 OCT-18 ...
0
votes
0answers
10 views

Issue with non-printable char while exporting CSV data (Windows-1252) to hive

I am trying to create a hive table on top of following CSV dataset using OpenCSVSerde WITH SERDEPROPERTIES ("quoteChar"='\"', "separatorChar"=',') but the hive table is losing the £ sign, and ...
1
vote
0answers
15 views

Avro files created using Spark and having DecimalType fields

I created Avro datafiles using spark2 and then defined a hive table pointing to the avro datafiles. val trades= spark.read.option("compression","gzip").csv("file:///data/nyse_all/nyse_data").select($"...
1
vote
0answers
19 views

Hive Warehouse Connector + Spark = signer information does not match signer information of other classes in the same package

I'm trying to use hive warehouse connector and spark on hdp 3.1 and getting exception even with simplest example (below). The class causing problems: JaninoRuntimeException - is in org.codehaus....
1
vote
1answer
25 views

Is there a way to merge ORC files in HDFS without using ALTER TABLE CONCATENATE command?

This is my first week with Hive and HDFS, so please bear with me. Almost all the ways I saw so far to merge multiple ORC files suggest using ALTER TABLE with CONCATENATE command. But I need to merge ...
0
votes
0answers
15 views

Snapshot of each id and amount for different date ranges

For a specific date, I need the snapshot for each id and total amount starting with date and then adding 4,8,12,16,20 weeks (6 months). So need the total amount for each of date, date + 4 weeks, date+...
0
votes
0answers
12 views

Can't drop a table in Hive

I am trying to drop a table in hive but I see a strange error as shown below: DROP TABLE tom.employee; Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache....
0
votes
2answers
38 views

HIVE: Insert into seems to be overwriting the existing table

I have a data set with around 13,000 records and I am trying to insert another data set with around 13,000 records in to the first table. I am not receiving any error messages, but the resulting table ...
0
votes
0answers
7 views

SerdeProperties not showing correctly in Hive

Hive table does not display the field.delim and serialization.format correctly when executing show create table. For example field.delim value '\u0001' is displayed as an empty string. Is there a ...
0
votes
0answers
14 views

How to extract huge number of rows(more than 10000) from Hive to NiFi as Incremental format?

I am trying to get all the record rows from Hive to Nifi in small batches with incremental format. Like, 1-100 rows in FlowFile-1, 101-201 in FlowFile-2. I know its possible by max rows in flowfile. ...
-1
votes
1answer
37 views

How to get latest 3 months data by default from complete data set

I have a full year data set and I have developed a power bI report on it and I scheduled it. I need to show up last 3 months data every time. Column a column b column c a 1 2019-01-...
1
vote
5answers
46 views

Count elements and find the maximum

I've got a table like this: +-----+-----+-----+ | uid | aid | tid | +-----+-----+-----+ | 1 | 6 | 7 | +-----+-----+-----+ | 2 | 6 | 7 | +-----+-----+-----+ | 3 | 5 | 7 | +-----+-----...
1
vote
1answer
23 views

DataFrame.write.parquet - Parquet-file cannot be read by HIVE or Impala

I wrote a DataFrame with pySpark into HDFS with this command: df.repartition(col("year"))\ .write.option("maxRecordsPerFile", 1000000)\ .parquet('/path/tablename', mode='overwrite', partitionBy=["...
0
votes
1answer
14 views

hive configuration hive.stats.fetch.partition.stats does not exists

I am using hive version 3.1.1 and when I try to set hive.stats.fetch.partition.stats=true. I get following error. is hive.stats.fetch.partition.stats is not available in this hive version? Query ...
0
votes
1answer
14 views

How to delete fields from a partitioned table in Hive stored as parquet?

I'm looking for a way to modify a parquet data table in HIVE to remove some fields. The table is managed but it doesn't matter because I can convert it to external. The problem is that I can not use ...
0
votes
1answer
24 views

How can i pass all attributes from an xml to flowfile?

I have a nifi flow, that consists in introducing values from a xml into a hive table. I need to do that xml evaluation automatically because it has a lot of values. Right now i'm doing that by ...
0
votes
1answer
12 views

Beeline/Hive2 variables passed from script getting truncated

I have a script from which i pass the parameters to hive variable. The entire flow is as below.One of the hive variable is getting truncated. cat << script.sh beeline -u "$HIVESERVER" \ -f $...
0
votes
0answers
14 views

FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

I am getting error ,given below.Any one can solve this error (base) [email protected]:/usr/local/apache-hive/$ hive /usr/local/hadoop/libexec/hadoop-functions.sh: line 2364: HADOOP_ORG.APACHE.HADOOP....
0
votes
1answer
21 views

How to get text bytes used by a string in Hive?

I have some data in Hive 1.2.1 table. I have to get raw bytes of a specific column. The column data is html raw in multiple languages. In order to get length of characters, I can use simple query like ...
0
votes
0answers
13 views

Checking if a partition exists in a hive table

I have to query hive tables to find if a partitions exist for a particular date. I have tried several commands. But I am stuck at one point. My hive tables can be partitioned based on multiple columns....
0
votes
1answer
15 views

Redshift Spectrum and Hive Metastore - Ambiguous Error

From Redshift, I created an external schema using the Hive Metastore. I can see the Redshift metadata about the tables (such as using: select * from SVV_EXTERNAL_TABLES), however when querying one of ...
0
votes
1answer
11 views

Create a table using the subquery in hive

I want to create a table using subquery in hive WITH subquery AS (SELECT dpspm.dpspm_epi_id AS person_identifier, hatmf.dmeme_ck AS meme_ck, ...
2
votes
2answers
44 views

Spark Partitioning Hive Table

I am trying to partition the hive table with distinct timestamps. I have a table with timestamps in it but when I execute the hive partition query, it says that it is not a valid partition column. ...
1
vote
1answer
56 views

Merge two columns but with different structure in hive

I have loaded a parquet file and created a Data frame as shown below ---------------------------------------------------------------------- time | data1 | data2 ----------------------...
0
votes
2answers
31 views

Hive: Extract Data From Nested JSON and Append

I have a hive table with IDs and JSON such as below: id json ---------- 21 | {"temp":"3","list":[{"url":"aaa.com"},{"url":"bbb.com"}]} 42 | {"temp":"2","list":[{"url":"qqq.com"},{"url":"vvv.com"}]} ...
0
votes
0answers
17 views

I want to plot a histogram but I got the message 'Column' object is not callable

I want to plot an histogram and then fit a lognormal distribution to the data but I get the message TypeError: 'Column' object is not callable. I obtained the data from a schema in Hive. data=...
1
vote
1answer
21 views

Hive SQL- Adding (not appending) seconds to an existing time stamp

I am trying to add seconds to an existing time stamp to find the end time. I have start time in one field and total duration in one field. How could i add. i tried to use date_add function, but that ...
1
vote
1answer
19 views

Hive SQL Distinct Column Syntax Error when calling multiple columns

After using a WITH clause and series of inner joins, I attempted to call back three columns: Employees, SalesID and a COUNT(DISTINCT) and encountered a Syntax Error. This is for a hadoop environment ...
1
vote
0answers
15 views

How to change hadoop temporary working directory /tmp to other folder

I am using hive and I want to change the mapreduce temporary working directory from /tmp to some other directory. I tried everything which could I find on internet but nothing is working. I can see by ...
0
votes
1answer
15 views

How to query data from multiple Hive tables having a similar naming pattern?

It is my maiden voyage into Hive. I have multiple Hive tables, like snapshots with names as follows: revenue_20110131 reveue_20110228 revenue_20110331 purchases_qrt1 purchases_qrt2 purchases_qrt3 ...
1
vote
2answers
45 views

Add some lines at the top of hive table

I have a table of this form in hive (Before): AB_dimp|SF_0060H00000nhSrmQAE|EBA Order 1127735|Execute|New From AB_dimp|SF_0060H00000nhSwkQAE|EBA Order 1127725|Execute|New From AB_Dimp|...
0
votes
2answers
37 views

Hive query is not returning output

Hive query is not returning the data (0 number of rows). need to retrieve 1 month back records from till/current date from the table. select * from table1 where date_format(order_date,'yyyy-MM-dd') ...
0
votes
0answers
16 views

Impala to Kudu load error - Cast column from string to decimal gives error

I am loading data from an Impala table to Kudu table via Impala editor. The impala table is a staging table between files loaded to HDFS i.e I use LOAD DATA INPATH query to load my impala staging ...
0
votes
0answers
11 views

Can i execute hive -f command from python subprocess

Subprocess.popen(["hive","-f","script_file.txt"]) gives no such file or directory. So i tried Subprocess.popen('hive -f script_file.txt', shell=true) says finished but not executing hive queries ...
0
votes
0answers
11 views

Unable to initialize thriftserver

I'm trying to start thrift with derby using start-thriftserver.sh. I have this error and I can't understand why. starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /path-...
0
votes
3answers
25 views

SQL code to find between Days only. Not dates

I am trying to execute the code but it throws me an error. I need the days to be between 7 and 10 I have already tried running it but it is not working so far select * from hive.entity....
-3
votes
0answers
21 views

Error: Table 'CTLGS' already exists (state=42S01,code=1050) [on hold]

schematool -dbType mysql -initSchema --verbose hive> (base) [email protected]:~$ schematool -dbType mysql -initSchema --verbose SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found ...
0
votes
0answers
20 views

Dynamic loading issue on partitioned table

Getting an error at a time of dynamic loading into partitioned table in hive. First I have created a non-partitioned table dept as given below: create table dept(id int, name string) row format ...
0
votes
0answers
12 views

How to disable org.apache.hadoop.hive.common.type.HiveVarChar info in solr result?

I am setting up an elastic search engine using SOLR. I have a few columns where the datatype is VARCHAR(). The column in the result below(image uploaded) is an alphanumric document ID. So when I use ...
0
votes
0answers
14 views

I am not able to see my hive and hbase tables from Apache atlas

I am using HDP-2.6.0.3 and Ambari Version 2.6.2.2 . I have HBase Kafka and Ambari infra as well. Now i installed Apache atlas. In Atlas Ui.I can't able to see any hive and hbase tables. i try to ...
1
vote
1answer
28 views

hive bucketed table doing exchange and sort step in physical plan

I have two tables both are clustered on the same columns but while joining both the tables on clustered columns the execution plan shows both exchange and sort steps. Both tables are bucketed on the ...