Next
KSQL: Streaming SQL for Apache Kafka - 18-Oct-2017 11:18 - Rittman Mead Consulting
KSQL: Streaming SQL for Apache Kafka

Few weeks back, while I was enjoying my holidays in the south of Italy, I started receiving notifications about an imminent announcement by Confluent. Reading the highlights almost (...I said almost) made me willing to go immediately back to work and check all the details about it.
The announcement regarded KSQL: a streaming SQL engine for Apache Kafka!



Before going in detail, lets try to clarify the basics: what is KSQL? Why was it introduced and how does it complement Kafka?

What is KSQL?

We have been writing about Kafka several times, including my recent blogs were I was using it as data hub to capture Game of Thrones tweets and store them in BigQuery in order to do sentiment analysis with Tableau. In all our examples Kafka has been used just for data transportation with any necessary transformation happening in the target datastore like BigQuery, with the usage of languages like Python and engines like Spark Streaming or directly in the querying tool like Presto.

KSQL enables something really effective: reading, writing and transforming data in real-time and a scale using a semantic already known by the majority of the community working in the data space, the SQL!

KSQL: Streaming SQL for Apache Kafka

KSQL is now available as developer preview, but the basic operations like joins, aggregations and event-time windowing are already covered.

What Problem is KSQL Solving?

As anticipated before, KSQL solve the main problem of providing a SQL interface over Kafka, without the need of using external languages like Python or Java.
However one could argue that the same problem was solved before by the ETL operations made on the target datastores like Oracle Database or BigQuery. What is the difference then in KSQL approach? What are the benefits?

The main difference in my opinion is the concept of continuous queries: with KSQL transformations are done continuously as new data arrives in the Kafka topic. On the other side transformations done in a database (or big data platforms like BigQuery) are one off and if new data arrives the same transformation has to be executed again.

KSQL: Streaming SQL for Apache Kafka

So what is KSQL good for? Confluent's KSQL introduction blog post provides some use cases like real time analytics, security and anomaly detection, online data integration or general application development. From a generic point of view KSQL is what you should use when transformations, integrations and analytics need to happen on the fly during the data stream. KSQL provides a way of keeping Kafka as unique datahub: no need of taking out data, transforming and re-inserting in Kafka. Every transformation can be done Kafka using SQL!

As mentioned before KSQL is now available on developer preview and the feature/function list is somehow limited compared to more mature SQL products. However in cases where very complex transformations need to happen those can still be solved either via another language like Java or a dedicated ETL (or view) once the data is landed in the destination datastore.

How does KSQL work?

So how does KSQL work under the hood? There are two concepts to keep in mind: streams and tables. A Stream is a sequence of structured data, once an event was introduced into a stream it is immutable, meaning that it can't be updated or deleted. Imagine the number of items pushed or pulled from a storage: "e.g. 200 pieces of ProductA were stocked today, while 100 pieces of ProductB were taken out".
A Table on the other hand represents the current situation based on the events coming from a stream. E.g. what's the overall quantity of stocks for ProductA? Facts in a table are mutable, the quantity of ProductA can be updated or deleted if ProductA is not anymore in stock.

KSQL: Streaming SQL for Apache Kafka

KSQL enables the definition of streams and tables via a simple SQL dialect. Various streams and tables coming from different sources can be joined directly in KSQL enabling data combination and transformation on the fly.

Each stream or table created in KSQL will be stored in a separate topic, allowing the usage of the usual connectors or scripts to extract the informations from it.

KSQL in Action

Starting KSQL

KSQL can work both in standalone and client-server mode with the first one aimed at development and testing scenarios while the second supporting production environments.
With the standalone mode KSQL client and server are hosted on the same machine, in the same JVM. On the other side, in client-server mode, a pool of KSQL server are running on remote machine and the client connects to them over HTTP.

For my test purposes I decided to use the standalone mode, the procedure is well explained in confluent documentation and consist in three steps:

  • Clone the KSQL repository
  • Compile the code
  • Start KSQL using local parameter
./bin/ksql-cli local

Analysing OOW Tweets

I'll use for my example the same Twitter producer created for my Wimbledon post. If you notice I'm not using the Kafka Connect, this is due to KSQL not supporting AVRO formats as of now (remember is still in dev phase?). I had then to rely on the old producer which stored the tweet in JSON format.

For my tests I've been filtering the tweets containing OOW17 and OOW (Oracle Open World 2017), and as mentioned before, those are coming in JSON format and stored in a Kafka topic named rm.oow. The first step is then to create a Stream on top of the topic in order to structure the data before doing any transformation.
The guidelines for the stream definition can be found here, the following is a cutdown version of the code used

CREATE STREAM twitter_raw ( \  
  Created_At VARCHAR, \
  Id BIGINT, \
  Text VARCHAR, \
  Source VARCHAR, \
  Truncated VARCHAR, \
  ... 
  User VARCHAR, \
  Retweet VARCHAR, \
  Contributors VARCHAR, \
  ...) \
WITH ( \  
  kafka_topic='rm.oow', \
  value_format='JSON' \
  );

Few things to notice:

  • Created_At VARCHAR: Created_At is a timestamp, however in the first stream definition I can't apply any date/timestamp conversion. I keep it as VARCHAR which is one of the allowed types (others are BOOLEAN, INTEGER, BIGINT, DOUBLE, VARCHAR, ARRAY<ArrayType> and MAP<VARCHAR, ValueType>).
  • User VARCHAR: the User field is a JSON nested structure, for the basic stream definition we'll leave it as VARCHAR with further transformations happening later on.
  • kafka_topic='rm.oow': source declaration
  • value_format='JSON': data format

Once created the first stream we can then query it in SQL like

select Created_at, text from twitter_raw  

with the output being in the form of a continuous flow: as soon as a new tweet arrives its visualized in the console.

KSQL: Streaming SQL for Apache Kafka

The first part I want to fix now is the Created_At field, which was declared as VARCHAR but needs to be mutated into timestamp. I can do it using the function STRINGTOTIMESTAMP with the mask being EEE MMM dd HH:mm:ss ZZZZZ yyyy. This function converts the string to a BIGINT which is the datatype used by Kafka to store timestamps.

Another section of the tweet that needs further parsing is the User, that as per the previous definition returns the whole nested JSON object.

{
"id":575384370,
"id_str":"575384370",
"name":"Francesco Tisiot",
"screen_name":"FTisiot",
"location":"Verona, Italy","url":"http://it.linkedin.com/in/francescotisiot",
"description":"ABC"
...
}

Fortunately KSQL provides the EXTRACTJSONFIELD function that we can then use to parse the JSON and retrieve the required fields

I can now define a new twitter_fixed stream with the following code

create stream twitter_fixed as  
  select STRINGTOTIMESTAMP(Created_At, 'EEE MMM dd HH:mm:ss ZZZZZ yyyy') AS  Created_At, \
    Id, \
    Text, \
    Source, \
    ..., \
    EXTRACTJSONFIELD(User, '$.name') as User_name, \
    EXTRACTJSONFIELD(User, '$.screen_name') as User_screen_name, \
    EXTRACTJSONFIELD(User, '$.id') as User_id, \
    EXTRACTJSONFIELD(User, '$.location') as User_location, \
    EXTRACTJSONFIELD(User, '$.description') as description \
  from twitter_raw

An important thing to notice is that the Created_At is not encoded as BigInt, thus if I execute select Created_At from twitter_fixed I get only the raw number. To translate it to a readable date I can use the STRINGTOTIMESTAMP function passing the column and the data format.

The last part of the stream definition I wanted to fix is the settings of KEY and TIMESTAMP: a KEY is the unique identifier of a message and, if not declared, is auto-generated by Kafka. However the tweet JSON contains the Id which is Twitter's unique identifier, so we should to use it. TIMESTAMP associates the message timestamp with a column in the stream: Created_At should be used. I can defined the two above in the WITH clause of the stream declaration.

create stream twitter_with_key_and_timestamp \  
as \  
select * from twitter_fixed \  
with \  
(KEY='Id', TIMESTAMP='Created_At');

When doing a select * from twitter_with_key_and_timestamp we can clearly see that KSQL adds two columns before the others containing TIMESTAMP and KEY and the two are equal to Created_At and Id.

KSQL: Streaming SQL for Apache Kafka

Now I have all the fields correctly parsed as KSQL stream, nice but in my previous blog post I had almost the same for free using Kafka Connect. Now It's time to discover the next step of KSQL: tables!

Let's first create a simple table containing the number of tweets by User_name.

create table tweets_by_users as \  
select user_screen_name, count(Id) nr_of_tweets \  
from twitter_with_key_and_timestamp \  
group by user_screen_name  

When then executing a simple select * from table we can see the expected result.

KSQL: Streaming SQL for Apache Kafka

Two things to notice:

  • We see a new row in the console every time there is a new record inserted in the oow topic, the new row contains the updated count of tweets for the screen_name selected
  • The KEY is automatically generated by KSQL and contains the screen_name

I can retrieve the list of tables define with the show tables command.

KSQL: Streaming SQL for Apache Kafka

It's interesting to notice that the format is automatically set as JSON. The format property, configured via the VALUE_FORMAT parameter, defines how the message is stored in the topic and can either be JSON or DELIMITED.

Windowing

When grouping, KSQL provides three different windowing functions:

  • Tumbling: Fixed size, non overlapping. The SIZE of the window needs to be specified.
  • Hopping: Fixed size, possibly overlapping. The SIZE and ADVANCE parameters need to be specified.
  • Session: Fixed size, starting from the first entry for a particular Key, it remains active until a new message with the same key happens within the INACTIVITY_GAP which is the parameter to be specified.

KSQL: Streaming SQL for Apache Kafka

I can create simple table definition like the number of tweets by location for each tumbling session with

create table rm.tweets_by_location \  
as \  
select user_location, \  
count(Id) nr_of_tweets \  
from twitter_with_key_and_timestamp \  
WINDOW TUMBLING (SIZE 30 SECONDS) \  
group by user_location  

the output looks like

KSQL: Streaming SQL for Apache Kafka

As you can see the KEY of the table contains both the user_location and the window Timestamp (e.g Colombes : Window{start=1507016760000 end=-})

An example of hopping can be created with a similar query

create table rm.tweets_by_location_hopping \  
as \  
select user_location, \  
count(Id) nr_of_tweets \  
from twitter_with_key_and_timestamp \  
WINDOW HOPPING (SIZE 30 SECONDS, ADVANCE BY 10 SECONDS) \  
group by user_location;  

With the output being like

KSQL: Streaming SQL for Apache Kafka

It's interesting to notice that each entry (e.g. Europe North, Switzerland) is listed at least three times. This is due to the fact that in any point in time there are three overlapping windows (SIZE is 30 seconds and ADVANCE is 10 seconds). The same example can be turn into the session windows by just defining WINDOW SESSION (30 SECONDS).

The windowing is an useful option, especially when combined with HAVING clauses since it gives the option to define metrics for real time analysis.
E.g. I may be interested only items that have been ordered more than 100 times in the last hour, or, in my twitter example in user_locations having a nr_of_tweets greater than 5 in the last 30 minutes.

Joining

So far so good, a nice set of SQL functions on top of data coming from a source (in my case twitter). In the real word however we'll need to mix information coming from disparate sources.... what if I tell you that you can achieve that in a single KSQL statement?

KSQL: Streaming SQL for Apache Kafka

To show an integration example I created a simple topic known_twitters using the kafka-console-producer.

./bin/kafka-console-producer --topic known_twitters --broker-list myserver:9092

Once started I can type in messages and those will be stored in the known_twitters topic. For this example I'll insert the twitter handle and real name of known people that are talking about OOW. The format will be:

username,real_name  

like

FTisiot,Francesco Tisiot  
Nephentur,Christian Berg  

Once inserted the rows with the producer I'm then able to create a KSQL stream on top of it with the following syntax (note the VALUE_FORMAT='DELIMITED')

create stream people_known_stream (\  
screen_name VARCHAR, \  
real_name VARCHAR) \  
WITH (\  
KAFKA_TOPIC='known_twitters', \  
VALUE_FORMAT='DELIMITED');  

I can now join this stream with the others streams or tables built previously. However when trying the following statement

select user_screen_name from rm.tweets_by_users a join PEOPLE_KNOWN_STREAM b on a.user_screen_name=b.screen_name;  

I get a nice error

Unsupported join logical node: Left: io.confluent.ksql.planner.plan.StructuredDataSourceNode@6ceba9de , Right: io.confluent.ksql.planner.plan.StructuredDataSourceNode@69518572  

This is due to the fact that as of now KSQL supports only joins between a stream and a table, and the stream needs to be specified first in the KSQL query. If I then just swap the two sources in the select statement above:

select user_screen_name from PEOPLE_KNOWN_STREAM a join rm.tweets_by_users b on a.screen_name=b.user_screen_name;  

...I get another error

Join type is not supportd yet: INNER  

We have to remember that KSQL is still in developer beta phase, a lot of new features will be included before the official release.

adding a LEFT JOIN clause (see bug related) solves the issue and I should be able to see the combined data. However when running

select * from PEOPLE_KNOWN_STREAM left join TWEETS_BY_USERS on screen_name=user_screen_name;  

Didn't retrieve any rows. After adding a proper KEY to the stream definition

create stream PEOPLE_KNOWN_STREAM_PARTITIONED \  
as select screen_name , \  
real_name from  people_known_stream \  
PARTITION BY screen_name;  

I was able to retrieve the correct rowset! Again, we are in early stages of KSQL, those fixes will be enhanced or better documented in future releases!

Conclusion

As we saw in this small example, all transformations, summaries and data enrichments were done directly in Kafka with a dialect very easy to learn for anyone already familiar with SQL. All the created streams/tables are stored as Kafka topics thus the standard connectors can be used for sink integration.

As mentioned above KSQL is still in developer preview but the overall idea is very simple and at the same time powerful. If you want to learn more check out the Confluent page and the KSQL github repository!

Your Three Must-Know EPM and Analytics Themes from OpenWorld 2017 - 18-Oct-2017 05:38 - Performance Architects

Author: Kirby Lunger, Performance Architects

The enterprise performance management (EPM) and enterprise analytics (business analytics, BI, data visualization) arenas may have seemed like an afterthought at Oracle OpenWorld this year, since the conference focus areas seemed to be autonomous (self-learning) databases, blockchain and cloud infrastructure!  Never fear!  I’ve distilled the top three themes from these areas to bring you up to speed in this arena as quickly as possible.

  1. Everyday Low Prices: Walmart Pricing and (Less) Discounting Now Applies

One of the most attractive things about buying cloud solutions is that we can purchase “by the drop,” meaning that we can expand capacity (seats, CPUs, whatever) as we need them, at an affordable price.

For the last several years, Oracle was dinged by analyst firms like Gartner for maintaining a sales model that wasn’t optimized for the cloud world (aka: a very large field sales force used to long, protracted negotiations with a lot of off-the-list discounting).

The beast finally awoke!  Right around the time of OpenWorld, Oracle quietly slashed the prices of Oracle Analytic Cloud (OAC), their next-generation analytics platform combining BI, data visualization and analytics (Essbase!) capabilities, by 50%.  We believe this will lead to much clearer sales incentives and less required discounting on Oracle’s part moving forward…which should frankly make your life (and our experience as a partner as well) much easier.  As far as we know, this only applies to the PaaS arena at the moment, although we fully expect that this operating model will edge into the SaaS and IaaS areas as we look out into the coming fiscal year.

Oracle has also spent a lot of time over the last several months clarifying and narrowing down the list of offerings in each of its SaaS, PaaS, and IaaS focus areas.  You can go to the EPM arena, for example, and clearly see the list of modules and their prices with a “Purchase Now” button.  The days of trying to track down one of your 15 Oracle sales reps to get a quote are on their way out!

As part of this, product naming and grouping/bundling has been simplified.  You’ll notice the “Service” is getting dropped from most of the cloud product names (e.g., Enterprise Planning and Budgeting Cloud Service or EPBCS is now often referred to as “Enterprise Planning Cloud” or “EPC” on the Oracle website).

  1. The Dashboard is Dead: Information Before You Know You Need It Is the Name of the Game

Think about all of the notifications you receive from shopping sites based on your past and predicted shopping behavior.  You don’t painstakingly create a dashboard to analyze and report on your behavior…this just happens based on the data they collect on you and alerts are pushed to you on an “as-needed” basis.

And where do you consume your information?  On your mobile phone.  As of November 2016, mobile web traffic overtook computer web traffic66% of all emails are now opened on a mobile phone!

We’ve all been saying this for years…and now it’s finally happening.  Using adaptive intelligence (Oracle’s term for what a bunch of other folks used to call “artificial intelligence”) and machine learning, Oracle is betting the ranch on proactive alerting throughout their solution stack to address this mobile-first, information-push world.

In the enterprise analytics field, they’re investing heavily in mobile enhancements for the visualization capabilities in OAC and to their mobile applications (Day-by-Day and Synopsis), while in EPM, they continue to focus on predictive modeling enhancements to the EPM product suite.

  1. Long Live the Data Lake: “Directionally Correct” is All the Rage 

First of all, EPM folks, don’t freak out.  This doesn’t mean transactional systems are going away…we’ll be in world where “hybrid” data storage models rule for many years to come. It just means that the world is moving to a “directionally correct” orientation instead of a “precisely wrong” focus.

What does this mean?  Rather than waiting for complete, tried-and-true, cause-and-effect historical relationships to be established in a traditional relational database structure that is then reported out in a dashboard, Oracle is using data lake technologies (which store data in its raw format until it is needed for analysis) to provide pointers to possible trends that could predict an outcome.

A centerpiece of this is the new OAC – Data Lake edition (which we think signals a shift away from the Oracle “Big Data Discovery” messaging of the past few years), which launched the week of OpenWorld.

That said…Oracle did announce a significant update to a more traditional EPM data management solution for you “structured data” fans, the Enterprise Data Management Cloud Service (also known as Enterprise Data Management Cloud).  This is the totally-rearchitected cloud version of the on-premise Oracle Hyperion Data Relationship Management (DRM), and is due out by the end of 2017 calendar year.  It should make cloud-based, enterprise data management a lot easier.

Want to learn more about the evolution of Oracle’s EPM and enterprise analytics solutions and what this means for you?  Contact us at communications@performancearchitects.com and we’d be happy to set up a time to talk.

Patching Essbase in OAC to 17.3.3-926 with Oracle 11g RDBMS - 18-Oct-2017 05:21 - Performance Architects

Author: Andy Tauro, Performance Architects

I recently applied the 17.3.3-926 patch to Oracle Essbase Cloud Service (ESSCS) in Oracle Analytics Cloud (OAC). If you have not patched OAC yet, the good news is that this process is as close to “one-click” as it can get, with the capability to perform a “pre-check” of the instance (and while we have yet to see this fail, it is good to know that we can check to see if any known conditions for failure exist).

When the patching is initiated, it first performs a backup of the instance, just in case a rollback is needed. Once the backup is successful, the patching process begins. In our experience, this process takes about 30 minutes, including the backup. However, I fully expect that will vary depending on how much content is housed within the instance.

When I restarted the instance after the patch, I found that Essbase would not start. Digging through the log files on the instance, I found an error message that indicated that a “schema update” failed with the error message: “ORA-02000: missing ( keyword.”

Since the patching process in ESSCS is hidden from the customer, we contacted Oracle Support. After consulting with the product development team, we found out that the issue was because we were using Oracle Database 11g as our relational data store, a.k.a. DBaaS (Database-as-a-Service). By default, the update assumes that the latest version of the Oracle database (12c) is being used. To overcome this, Oracle Support provided a fix that required an update to a server script. The script being updated (setDomainEnv.sh) is found at the location “/u01/data/domains/esscs/bin” and Oracle Support advises backing up this script before making changes.

To change this script, one needs to change to the “Oracle” user with the command “sudo su – oracle.” Find the following words in the script: “-Dessbase.datasource=BIPlatformDatasource”

and replace it with “-Dessbase.datasource=BIPlatformDatasource -Dessbase.schema.update.disabled=true.”

Once the changes are made, log out of the instance and restart it using the “My Services: console. The updates will tell Essbase to skip the “Schema Update” step. As long as there isn’t anything else broken with the instance, this should allow Essbase to start up.

Oracle confirms that there is no functionality lost in using Oracle database 11g with ESSCS, and this issue will be fixed in patch 17.4, which is expected out in a couple of months. However, this stresses the importance of having a pre-production instance to verify that the deployed patches will not introduce issues with your setup. Since a PaaS like OAC allows for a lot of flexibility in how one uses one’s environment, it is important that such system changes are tested before they are released.

While included pre-checks can check for conditions on the Essbase server itself, when we consider the ways the instance can be connected to other services, whether in the Oracle Cloud or outside services like on-premise systems, not every possible condition can be checked for. That is because the tools included with OAC are robust and flexible enough to allow for solutions restricted only by one’s imagination.

ODTUG Elections – Vote Now! - 16-Oct-2017 11:19 - ODTUG
Elections for the 2018-19 ODTUG Board of Directors are underway — vote now!Exercise your right as an ODTUG member and vote for the board. This may be the most important thing you can do for ODTUG.
Proof that Less can be More - 11-Oct-2017 11:03 - Kevin McGinley

Snowflake Data Warehouse release 2.11 (September 7, 2017) included the addition of an OVERWRITE parameter for the INSERT command. INSERT OVERWRITE effectively consolidates two commands, TRUNCATE TABLE and INSERT, into one. INSERT OVERWRITE also deletes the file load history and retains access control privileges on the target table; both of which are not insignificant details.

Give it a try by creating a table and populating with two records:

create table test_insert_overwrite(
id int
,first_name varchar2(250)
,last_name varchar2(250))
;
insert into test_insert_overwrite values(1,'Wayne','Campbell'),(2,'Garth','Algar');

Verify that the two records are present:

select * from test_insert_overwrite;
+----+------------+-----------+
| ID | FIRST_NAME | LAST_NAME |
+====+============+===========+
|  1 | Wayne      | Campbell  |
+----+------------+-----------+
|  2 | Garth      | Algar     |
+----+------------+-----------+
2 Row(s) produced. Time Elapsed: 1.038s

Now add the OVERWRITE option to the INSERT command:

insert overwrite into test_insert_overwrite values (3,'Noah','Vanderhoff');

Confirm the original two records have been removed and there is now one record in the table:

select * from test_insert_overwrite;
+----+------------+------------+
| ID | FIRST_NAME | LAST_NAME  |
+====+============+============+
|  3 | Noah       | Vanderhoff |
+----+------------+------------+
1 Row(s) produced. Time Elapsed: 1.930s

The Query Profile for the INSERT OVERWRITE command used above shows a two-step process. First, a TRUNCATE command followed by the INSERT.

While there are multiple methods to achieve the desired result, using one statement to perform a “full load” is often the most efficient for many use cases.


Proof that Less can be More was originally published in Red Pill Analytics on Medium, where people are continuing the conversation by highlighting and responding to this story.

Proof that Less can be More - 11-Oct-2017 11:02 - Red Pill Analytics

Proof that Less can be More

Snowflake Data Warehouse release 2.11 (September 7, 2017) included the addition of an OVERWRITE parameter for the INSERT command. INSERT OVERWRITE effectively consolidates two commands, TRUNCATE TABLE and INSERT, into one. INSERT OVERWRITE also deletes the file load history and retains access control privileges on the target table; both of which are not insignificant details.


 

Give it a try by creating a table and populating with two records:

 

create table test_insert_overwrite(
id int
,first_name varchar2(250)
,last_name varchar2(250))
;
insert into test_insert_overwrite values(1,'Wayne','Campbell'),(2,'Garth','Algar');

 


 

Verify that the two records are present:

 

select * from test_insert_overwrite;
+----+------------+-----------+
| ID | FIRST_NAME | LAST_NAME |
+====+============+===========+
| 1 | Wayne | Campbell |
+----+------------+-----------+
| 2 | Garth | Algar |
+----+------------+-----------+
2 Row(s) produced. Time Elapsed: 1.038s

 


Now add the OVERWRITE option to the INSERT command:

 

insert overwrite into test_insert_overwrite values (3,'Noah','Vanderhoff');

 


 

Confirm the original two records have been removed and there is now one record in the table:

 

select * from test_insert_overwrite;
+----+------------+------------+
| ID | FIRST_NAME | LAST_NAME |
+====+============+============+
| 3 | Noah | Vanderhoff |
+----+------------+------------+
1 Row(s) produced. Time Elapsed: 1.930s

 


 

The Query Profile for the INSERT OVERWRITE command used above shows a two-step process. First, a TRUNCATE command followed by the INSERT.

 

Query Profile

 

While there are multiple methods to achieve the desired result, using one statement to perform a “full load” is often the most efficient for many use cases.

 

PartyTime

Updating Oracle EPM (Hyperion) for Struts RCE Vulnerability - 11-Oct-2017 05:33 - Performance Architects

Author: Andy Tauro, Performance Architects

It is hard to not have heard of the Apache Struts vulnerability that affected major websites, including those of Equifax. When this happened, IT support teams scrambled to check their environments and to apply fixes as quickly as they could be found. Our customers running Oracle EPM (Hyperion) (as well as Oracle Business Intelligence Enterprise Edition or OBIEE) on-premise inquired about how to address this as well.

From what we can tell, certain Oracle EPM components use Struts via the WebLogic Application Server (WAS or WLS) 10.3.6 platform. While WAS 10.3.6 comes bundled with Struts 1.x & 2.x, it looks like Oracle EPM only uses Struts 1.x. This particular version of Struts has not been named in the vulnerability that caught the world by storm, Oracle released a fix for WebLogic that updates the version of Struts via a Smart Update (SU) patch.

The patch (26835212) updates the Struts version in WebLogic 10.3.6 to 1.3.9 and 2.3.34. We have applied this over the last security Patch Set Update (PSU) that Oracle released (25869650, in July 2017). To apply this, one would use the BSU utility that is available in the “Middleware Home” directory ({MW_HOME}/utils/bsu). The general steps for this are:

  1. Stop EPM services running
  2. Extract patch into the “{MW_HOME}/utils/bsu/cache_dir” directory
  3. From the location “{MW_HOME}/utils/bsu,” apply the patch as: “bsu.cmd/sh -install -patch_download_dir={MW_HOME}/utils/bsu/cache_dir -patchlist={PATCH_ID} -prod_dir={MW_HOME}/{WL_HOME}” and patch ID for July 2017 PSU = B25A, for 25869650 = UZCY
  4. Delete “cache” and “tmp” directories for each WebLogic Managed Server that runs Oracle EPM services
    • These will be found in the location “{MW_HOME}/user_projects/domains/EPMSystem/servers”
  5. Restart Oracle EPM

One important note: “MW_HOME” equals “Middleware Home” and “WL_HOME” equals “WebLogic Home” (usually “{MW_HOME}/wlserver_10.3”).  Also, these steps apply primarily to Oracle EPM Version 11.1.2.4.

At this time, we have run basic tests on the various Hyperion modules and have not seen any issues. We are working with our clients to perform more thorough health checks, since every environment has some nuance that makes it unique and sometimes exposes an issue that was not thought about.

Are you thinking of applying this patch and have questions? Feel free to drop a note as we would love to hear from you and share notes on this.

We are at that time of the year again…
Like last year, this is my contribution for the ODC Appreciation Day (#thanksODC), promoted by Tim Hall.

My post starts with a little rant: OTN is gone, new name is now ODC. I’m sure there are some really smart marketing people behind that name change, but “killing” a brand known and used for ages to try to promote something new/different is the kind of things I don’t like a lot (isn’t the new thing good enough to be known by itself?).
I’m still calling things OTN just because that’s what it is in my mind and how most of us knows it.

OBIEE 12c “lightweight” login page and VA/DV customization

This post isn’t really about a functionality but more about a missing/removed functionality. I saw people asking about it (in OTN forums … sorry: ODC forums) so I thought it was worth to write about it explaining things a bit.

new lightweight login (#thanksODC)

The lightweight login page

With OBIEE 12.2.1.3.0 there is, by default, a new login page called “lightweight SSO”. It was possible to enable it already in 12.2.1.2.0 and the role of this new page is to provide a shared session between OBIEE Answers/Dashboards (the classical interface) and Visual Analyzer.

It has a different style when compared to the “old” classical login page, and it also ignores completely your own custom style you maybe deployed in OBIEE. In the old way of doing things a custom style was a simple and supported way of branding the interface to make OBIEE totally yours. Even if just by deploying a custom “oracle_logo.png” picture to have your own logo on the top-left header.

"old" classic OBIEE login (#thanksODC)

The “old” classic login page uses oracle_logo.png

When talking about custom style not affecting the new login page in OBIEE how to not think about Visual Analyzer?

VA/DV logo (#thanksODC)

VA/DV has it’s own logo and name, not using your OBIEE custom style

Logo, name, colors: nothing coming from your own custom style is used and you get a 100% Oracle branding.

How to customize the style of these elements?

Let’s inspect the new login page a bit to find out what options you can have…

lightweight login logo (#thanksODC)

The new OBIEE lightweight login page logo isn’t a picture anymore

First surprise: there is no picture at all where the logo is placed. Inspecting the HTML shows a empty tag. Wait a second: not really empty …
There is a CSS rule setting the “content” property of this tag, which is a way to add content in the page. Content can be a static text or a picture or few other things, in this case it is just 1 character encoded using unicode: \e666. This isn’t really a readable (latin) character.

Can \e666 be the logo?

\e666 the Oracle logo (#thanksODC)

\e666 the Oracle logo

How is it possible that \e666 (any reference to 666 – the number of the beast is purely coincidental?) is visible on the screen as the Oracle logo? When checking the style of the DIV element itself you can see it has a custom font-family defined: “vafont”. This for sure isn’t a standard font (like Arial, Verdana, Times New Roman etc.) and I don’t have it installed on my system.

vafont.woff loading (#thanksODC)

The login page load a custom font (used by CSS)

remove 'vafont' (#thanksODC)

When you remove ‘vafont’ as the font-family for that element, no logo anymore

If you disable just that attribute in the CSS the Oracle logo disappear and a square is visible instead (as I said \e666 isn’t really a readable character). As expected the “vafont” is loaded by the page itself and it isn’t a system font at all.

Not as simple to customize as an OBIEE custom style…

That’s the main reason why it’s more difficult to customize that login page, or the VA/DV interface and put your own logo there instead: it isn’t a picture anymore!
The easiest way would be to edit the CSS to redefine the “content” attribute and link to your own logo stored somewhere accessible (by a url), for example to your “oracle_logo.png” deployed by your classic OBIEE custom style.

The main problem with this approach is that, so far, there isn’t an officially supported way of doing it, so you would need to take the EARs containing these deployments, edit the CSS and redeploy the archives.
At the next patch or upgrade restart again as there are good chances the EAR has been replaced again by an Oracle default one.

Lightweight login with custom logo (#thanksODC)

In the CSS it’s possible to point to the custom style oracle_logo.png instead of \e666

VA/DV uses a different custom font (bi-alta), I didn’t check if it’s the exact same with a different name, but the logo is still \e666.

What other icons are included in the bi-alta font?

I was curious to see what else was included in that font, here it’s the full list I got (and you can see many pieces of the VA/DV interface are there):

bi-alta font (#thanksODC)

A view of all the icons contained in the bi-alta font loaded by VA/DV (click to zoom)

The post ODC Appreciation Day: Customize style of lightweight login and VA/DV appeared first on Gianni's world: things crossing my mind.

ODC Appreciation Day: OBIEE's Time Hierarchies - 10-Oct-2017 02:58 - Rittman Mead Consulting
ODC Appreciation Day: OBIEE's Time Hierarchies

After last year successful OTN Appreciation Day, it's time again to show our love for a particular feature in any Oracle's tool we use in our work. You may have noted a name change with OTN now becoming ODC: Oracle Developer Community.

What

The feature I want to speak about is OBIEE's Time Hierarchies.
For anybody in the BI business the time dimension(s) are the essence of the intelligence bit: being able to analyze trends, compare current period with previous one, plot year to date or rolling measures are just some of the requirements we get on daily basis.
A time hierarchy definition allows the administrator to set which time levels are exposed, how the rollup/drill down works and how previous/following members of the level are calculated.
Once the hierarchy is defined, all the related calculations are simple as calling a function (e.g. AGO), defining the level of detail necessary (e.g. Month) and the number of items to take into account (e.g. -1).

A Time hierarchy definition is necessary in the following cases:

  • Time comparisons - e.g. current vs previous month
  • Time related rollups - e.g. Year to date
  • Drill path definition - e.g. Year-Month-Day
  • Fact Tables at various level of details - e.g. daily fact table and monthly pre-aggregated rollup
  • Time related level based measures - e.g. monthly sum of sales coming from a fact table at daily level

Why

Why do I like time hierarchies? Simple! It's a very clever concept in the RPD, which requires particular knowledge and dedicated attention.

If done wright, once defined, is available in every related table and makes the time comparison formulas easy to understand and to create. If done wrong, errors or slowness in the related analysis can be difficult to spot and improve/fix.

Still time hierarchies are a central piece in every BI implementation, so once understood and implemented correctly give a massive benefit to all developers.

How

We blogged about time dimensions and calculations back in 2007 when OBI was still on version 10! The original functionality is still there and the process to follow is pretty much the same.
Recently was introduced the concept of Logical Sequence Number, a way of speeding up some time series calculations by removing the ranking operations needed to move back (or forth) in history.

ODC Appreciation Day: OBIEE's Time Hierarchies

I wanted to keep the blog post short, since the time hierarchies information can be found in millions of blog posts. I just wanted the to give few hints to follow when creating a time hierarchy:

  • It can be created on any data with a predefined order, no need to be a date! you could compare for example a certain product with another in the inventory having the previous code.
  • The Chronological Key defines the sorting of the level, for example how years, months or dates are ordered. Ordering months alphabetically with a format like YYYY-MM it's correct while using MM-YYYY provides wrong results.
  • Double check the hierarchies, something like YEAR-> MONTH -> WEEK -> DATE can be incorrect since a week can be split in different months!
  • Set appropriately the number of elements for each level. This is useful, especially when the hierarchy is complex or pre-aggregated facts, for OBIEE to understand which table to query depending on the level of the analysis.
  • Setup the Logical Sequence Number. LSNs are useful if you are looking to reduce the impact of the time series processing at a minimum.
  • If you are looking for very optimal performances for a specific report, e.g. current year vs previous, physicalizing the time series result, previous year, directly in the table alongside with the current year will give what you're looking for.

This was just a quick overview of OBIEE's Time Hierarchies, why are so useful and what you should be looking after when creating them! Hope you found this short post useful.

Follow the #ThanksODC hashtag on Twitter to check which post have been published on the same theme!

General guidelines for Customization,Configuration and Administration

 

SSO Authentication

Implementers can refer to the steps to integrate SSO with IDM as explained in ‘Security Guide for Oracle Business Intelligence Enterprise Edition’.

FA Security Metadata Extraction steps Identify the Data Objects to secure

e.g.

Supervisor Person, Department, Position, etc are typical Data security objects

HR Line Manager, Payroll Administrator, etc are typical Object Security Roles*.

Fusion Apps (FA) Source Objects

Identify the Data Security Roles and Object Security Roles (AoR or Area of responsibility roles) associated to these objects.

For data security implementer requires to identify tables containing User to Data Object ID mapping.

Finalize SQLs list for the above requirements.

Form SQL’s to extract data which can be extracted from source for Data and Object Security. Example target format in next section (Creating DWH side security metadata Table inherited from FA)

Security Table Object List

Some of Fusion Apps security tables and SQLs to extract security information are provided in appendix. Additional data objects can be obtained from Fusion Apps implementation team as per requirement.

 

*Note, OLAP and OLTP roles may not necessarily converge on most occasions. Extract only those roles, which makes business sense for BI, and not just every role present in Fusion Apps.

 

FA Security Metadata ETL to Warehouse  Creating warehouse side security metadata Table extracted from FA

Data extracted from FA security tables must be loaded in certain format for ease of query by the RPD init blocks.

 

·         As an example, for on-prem customers we could possibly use 3 tables to store data effectively . Table structure is as follows. The columns and tables can be modified or extended based on additional requirements.

o   Roles Table (Role type : Data Security or Object Security) : w_sec_roles

Attributes: Role Name, Role Description, Role Type

Role Name

Role Description

Role Type

DSN

DEPT_LINE_MGR

Department Line Manager

Area of Responsibility(AOR)

210

DEPT_TOP_MGR

Department Top Manager

Data Security(DS)

210

 

o   Roles+User Table : w_sec_role_users

Attributes: Username, Role Name, DSN

USERNAME

Role Name

DSN

JOHNDOE

DEPT_LINE_MGR

210

GWILLIAMS

DEPT_TOP_MGR

210

 

o   User+Data Object Table : w_sec_user_objects

Attributes: Username, Object Name, Object ID,DSN

USERNAME

Object Name

Object ID

DSN

JOHNDOE

DEPARTMENT

1002

210

GWILLIAMS

SUPERVISOR

31052

210

Note here that Object refers to Securing Object to be used in BI side.

 

o   Useful SQL’s based above tables containing extracted security data

SQL Description

Physical SQL

User to Data Security Role

Select username,role_description

from w_sec_roles rol, w_sec_role_users rol_usr

where rol.role_name = rol_usr. role_name

and rol_usr.dsn = rol.dsn

and role_type=’DS’

 

User to Data AOR  Role

Select rol_usr.username,rol.role_description

from w_sec_roles rol, w_sec_role_users rol_usr

where rol.role_name = rol. role_name

and rol_usr.dsn = obj.dsn

and role_type=’AOR’

 

User to Object(Supervisor)

select username,object_id

from w_sec_role_users rol_usr,w_sec_user_objects obj

where rol_usr.username = obj.username

and rol_usr.dsn = obj.dsn

and object_name = ‘SUPERVISOR’

 

 

User to Object(Department)

select username,object_id

from w_sec_role_users rol_usr,w_sec_user_objects obj

where rol_usr.username = obj.username

and rol_usr.dsn = obj.dsn

and object_name = ‘DEPARTMENT’

 

 

 

o   NOTE

Data Security objects like Department and Supervisor are hierarchical objects.

OBIA expects the following in w_sec_user_objects table:

·         Provide top and all its child Dept ids for assigned user. Only top dept ID

Next