GOAL - Demonstrate some cornerstone capabilities of configuring Hive access via Ranger
PREREQUISITE - Risk Analysis with Pig
SEE ALSO - This demo is based on these two publicly-available Hortonworks tutorials:
- Securing your Data Lake Resource & Auditing User Access with HDP Advanced Security
- Securing HDFS, Hive and HBase with Knox and Ranger
RECORDED DEMO
PRIOR DEMO CLEANUP - Cleanup
The Sandbox is provisioned with a user group called Marketing
which is made up
of the following individual user accounts; mktg1
, mktg2
and mktg3
. The
following "prevent Marketing
users from accessing ..." restrictive
use cases will be explored in this demo.
Type | Specifics |
---|---|
Table | geolocation_stage and trucks_stage |
Row | Where geolocation.event is not equal to normal |
Column | events and totmiles from risk_factor |
As we will see later in this demo, a Hive View will be needed to fully
implement the row-level security requirement. Logged into Ambari as
maria_dev
, run the following DDL.
CREATE VIEW geo_normal_event AS
SELECT * FROM geolocation
WHERE event = 'normal';
Ensure results are returned SELECT * FROM geo_normal_event LIMIT 100;
is executed.
Unfortunately, the HDFS home directory for the Marketing
users are not created. Using a
simple user provisioning process,
run the following steps to create a home directory for mktg
(we will not be using the other accounts in this demo).
HW13005:~ lmartin$ ssh [email protected] -p 2222
[email protected]'s password:
Last login: Fri Sep 22 14:59:22 2017 from 10.0.2.2
[root@sandbox ~]# su - hdfs
[hdfs@sandbox ~]$ hdfs dfs -mkdir /user/mktg1
[hdfs@sandbox ~]$ hdfs dfs -chown mktg1 /user/mktg1
[hdfs@sandbox ~]$ exit
logout
[root@sandbox ~]#
The Marketing
users are already created on the underlying Linux system, but
they need to be allowed to log into Ambari so they can use the Hive View.
Log into Ambari as admin
and go to the admin pulldown in upper-right corner,
select Manage Ambari > Users > Create Local User and then create accounts
for mktg1
, mktg2
and mktg3
(set password to password
for all three)
which should look like the following.
After creating the third user, click on the Views link in the left-side Views UI widget to see the list of views. Then toggle the Hive view and click on Hive View as highlighted below.
Scoll down to the Permissions widget and add the three mktgN
users as seen in the next screenshot.
As identified in the Sandbox splash page, open a browser on the Ranger UI at
http://127.0.0.1:6080 and login with raj_ops
/ raj_ops
credentials. Select
the Sandbox_hive link and then click on Policy ID link associated with
the "Hive Global Tables Allow" policy from the list.
Move the Policy Name's enabled
slider selection to disabled
and click
on the Save button at the bottom.
At this point, the Marketing
users should not be able see any tables. Log
into Ambari as mktg1
and within the Hive View validate no tables are listed
under the default
Databases table list as well as show tables;
returns
an empty list. Additionally, a security error should be display when
attempting to run a query.
GOAL: Prevent Marketing
users from accessing geolocation_stage
and trucks_stage
tables
Since we are starting with zero access to Marketing
, we will create a
single security policy to allow the team to view all tables except the
staging ones. Log into Ranger as admin
and navigate to Sandbox_hive
and then click Add New Policy to create a policy named "Default Mktg
Access to Trucking" that is configured to allow access to all tables in
the default
database except for the "staging" ones.
NOTE: The configurable value for the policy refresh is 30 seconds, so keep trying until new rules are accessible via Hive.
Now, back in Ambari's Hive View, the mktg1
user can retrieve results
for show tables;
as well as see the list of tables in the Databases UI
widget. More importantly, this user can now run queries on the tables
shown in the list (ex: SELECT * FROM driver_mileage LIMIT 100;
).
To validate the table-level restrictions are in place, attempt to run a query on one of the staging tables to verify a security error is presented.
GOAL: Prevent Marketing
users from accessing rows from geolocation
table when the event
column has a value other than normal
Row-level security is not an intrinsic feature of Hive yet, but Ranger does offer row-level filtering. As this HCC discussion indicates, the common approach to this need is to create a Hive View that selects only the data that you want to grant/restrict access to and then secure the View, and the underlying table, accordingly.
At the beginning of the demo we created a geo_normal_event
view that
the Marketing
team can access. Obviously, we did this ahead of time, but
in a model where the requirement comes along after the structures are in
place, one simply needs to create the view and then lock down access to
the backing table.
To satisfy this requirement with our current system, simply edit the
"Default Mktg Access to Trucking" policy in Ranger to include geolocation
along with the staging tables being restricted.
To verify, ensure mktg1
can select from geo_normal_event
, but not from geolocation
.
GOAL: Prevent Marketing
users from accessing the events
and totmiles
columns of the risk_factor
table
To implement this requirement we can simply modify the same Ranger policy to exclude access to these columns (with this configuration, these column names would be restricted from any table).
To verify, log into Ambari as mktg1
and verify the following message surfaces
when attempting to execute SELECT * FROM risk_factor LIMIT 100;
.
Verify all works correctly when selecting only allowed columns.
SELECT driverid, riskfactor FROM risk_factor LIMIT 100;
Return to the Hive policy list in Ranger and perform the following actions.
- Disable the "Default Mktg Access to Trucking" policy
- Enable the "Hive Global Tables Allow" policy