Next

Author: Doug Ross, Performance Architects 

Introduction

As more organizations move their business intelligence (BI) environments to the cloud, loading and accessing enterprise data will become as important as the front-end visualizations.  Oracle’s BI Cloud Service (BICS) offers several options for those data requirements that go beyond simple data upload. Each has a specific purpose, features, benefits, and limitations. One of the more powerful options for true enterprise-level data transfer to the cloud is Oracle’s Data Sync tool.

Data Sync Overview

Data Sync provides a full-featured data transfer tool with a client interface that allows for scheduling load jobs that efficiently move data from flat files, database tables, and other cloud data sources into the BICS Database Schema Service or Oracle Database Cloud Service.  It can also directly load data as a data set source for the Visual Analyzer projects that are available in BICS.

It includes many of the features found in other data loading tools: logging of load job execution steps, restarting after failures, incremental loading of new or modified data, and configuring the sequence of load operations.  A Data Sync solution would more likely lend itself either to new development of data load processes or to a more agile analytics environment that allowed for changing processes and data models more rapidly than would be possible with an on-premise database.

Data Sync Configuration Steps

Data Sync’s primary function is to upload data into a BICS environment.  Data can be loaded from flat files (CSV or XLSX), relational database sources (either tables, views, or SQL statements that are executed dynamically), Oracle Transactional BI (OTBI), JDBC data sources (e.g., MongoDB, Impala, SalesForce, RedShift), or the Oracle Service Cloud.  Uploaded data can be stored in cloud-based tables or data sets accessible to the data visualization components.

Data Sync can:

  • Load data sources other than Oracle in addition to data files or Oracle tables
  • Execute incremental data loads or rolling deletes and insert / append strategies
  • Merge data from multiple sources
  • Schedule data loads

Data Sync is installed on a local computer running either the Windows or Linux operating systems.  Prior to installing Data Sync, ensure that Java Development Kit (JDK) 1.7 or later is installed on the local computer.  It must be the JDK and not a JRE.  It is also necessary to validate that the user account that will be used to access the BICS database schema has the proper permission.   Work with your cloud administrator to request permission to upload data to Oracle BICS by assigning the BI Data Load Author application role to the account.   To upload data to a data set instead of a table, the BI Advanced Content Author application role should be assigned.

Installation Steps:

  1. Download the Data Sync software from Oracle Technology Network. Currently located at: Data Sync Download on OTN
  2. Unzip BICSDataSync.Zip to a local directory (no spaces in directory name)
  3. Set the JAVA_HOME variable in config.bat or config.sh to point to JDK location
  4. Copy database-specific JDBC drivers to Data Sync’s \lib directory

Data Sync is comprised of both a server component and a client GUI interface.  To start Data Sync and its server component, run datasync.bat (Windows) or datasync.sh (Linux/UNIX) from the Data Sync installation directory. The Data Sync icon displays in the system icon tray to indicate that the server is up and running.

To access the client interface, click on the icon and choose Start UI to open the Data Sync client.

Click on the icon and choose Exit to stop the Data Sync server.

Data Sync Updates

Data Sync automatically checks against the cloud environment to see if a new version is available prior to each load job executing.  It is possible that a new version of Data Sync has been installed in the cloud that is incompatible with the local version.  If the versions match, the data load continues unaffected.  If the minor version of the tool is changed in the cloud, it indicates a patch is available.  This triggers a one-time alert and an optional email if the Data Sync server is configured for emailing notifications.   If the version change indicates a major version release, an alert is created and an email sent. The data load job is stopped and will not run until the new version is installed.

New versions (minor or major) are simple to install by following the standard installation process into a new home directory and then by copying the existing environment configuration into the new installation.

Data Sync Terminology

Connection:      Defines the data sources and target databases.

Project:             A container which describes and organizes the information related to a data load.
There can be any number of projects within a Data Sync instance.

Job:                  The mechanism for uploading all the data sources defined in a project to BICS.

Load Strategies

When moving data to the cloud, a load strategy defines how that data is loaded from the data source into the target. Incremental data loading is available as long as there is a column on the source table which uniquely identifies each row (primary) as well as another column with a “Date/Timestamp” data type that can be used to identify which rows have been added or modified since the previous load.   During the initial load job, the full source table is transmitted to the cloud.  In subsequent loads, the last update date is compared to a stored timestamp for

Load strategy options include:

  • Replace data in table: Truncates existing data and always reloads the table. Any indexes are dropped prior to data load and are recreated after the load
  • Append data to table: New data is added to the table without checking for any prior existence of data.  Indexes are not dropped before the load.  Any new indexes defined on the source are created after the data load
  • Update table (add new records only): Requires a primary key column. If a row with the same key does not exist, then it is inserted, or else the record is ignored
  • Update table (update existing records): Requires a primary key column.  If the data with the same key is available, then it is updated, or else it is ignored

Loading Data Sets

Data sets are separate data objects used primarily by the Visual Analyzer component of BICS.  They can be thought of as separate data files stored in the cloud.   Data Sync can load those data sets with the following guidelines:

  • If a data set by a similar name does not exist, one is created automatically with default settings
  • All string and timestamp based source columns are set to attribute type, and numeric datatype columns are set as measures in the target data set
  • The maximum data set size is 50MB; data uploads fail if the data set exceeds the 50MB limit

Loading Data with Jobs

Data Sync uses the concept of a job to organize, schedule, and execute load processes. A run is an instance of a data loading job. For example, if you run a job twice, then you’ll see two run records on the History tab.  As is common with most scheduling applications, Data Sync allows for job settings to recur on a scheduled basis to meet whatever load frequency is required.

Parameterized Jobs

Parameters are available in Data Sync to customize the data loads at run time to dynamically change the conditions on the data selection step.  Rather than changing the SQL statements used to query the data sources, a parameter can be changed at the project or job level instead. A job level parameter will override a parameter with the same name at the project level.

Chaining Data Sync Load Jobs

There may be occasions where it would be beneficial to the load process to chain load jobs in a specific order. Possible reasons might be the need to load multiple separate data source tables into a common target or the need to load aggregate tables after base level data tables have completed.   The first step would be to separate the load jobs into distinct projects.

When a job starts, a signal file with a name like “job_name”_StartSignal.txt is created in the Data Sync:  log\jobSignal directory for each run of the job. A file with the naming pattern “job_name”_CompletedSignal.txt is created when the job completes successfully, or “job_name”_FailedSignal.txt when the job fails.  Data Sync has a polling mechanism to look for the existence of these files before executing another load job.

By editing the on_demand_job.xml file located in the conf-shared directory, you can specify the name of a file that will trigger a specific load job.

For example:

<TriggerFile job=”Load_HR_Data” file=”d:\Load_The_HR_Data_Now.txt”/>

In the example above, Data Sync polls for the presence of the Load_The_Hr_Data_Now.txt file, and when it is found it triggers the execution of the Load_HR_Data job.  When the job is started, the triggering file is deleted.   A time window for polling can also be specified in the XML file.

<TriggerFile job=”Load_HR_Data” file=”d:\Load_The_HR_Data_Now.txt”>
<TimeWindow startTime=”00:00″ endTime=”05:00″/>
<TimeWindow startTime=”21:00″ endTime=”23:00″/>
</TriggerFile>

Two other options exist for triggering Data Sync load jobs, either from the command line using the datasyncCmdLine.bat/.sh script file or else with polling tables stored in a database.  Both of these methods are described in detail in the Data Sync documentation.

Monitoring and Troubleshooting Data Loads

Data load jobs can be monitored while in progress using the Current Jobs tab or History tab in Data Sync.   The top panel shows the job summary information. The Tasks tab underneath that panel shows one record per user-defined data flow. The tasks and the details show important metrics, including start and end timestamps, the number of rows read and written and throughput.

The run log files are stored in the Data Sync log directory. One directory per run is created with a naming convention of CR_<Table/File Name>_<From Connection>_<To Connection>.<Timestamp>.log.

There are options available within the Connections configuration screens to reset a data target to do a full reload if necessary.  The reload can be for all connections or individual tables within a connection.

Data Sync Configuration Migration

As with most development environments, it may be necessary to maintain multiple instances of Data Sync for DEV, TEST, and PROD.  This can be managed within the Data Sync interface by exporting the metadata from one environment and importing it into the next.  The export and import options are available under the Tools menu.

When exporting, there are options to export the following metadata types:

  • Logical: Exports all information contained in the Project view
  • System: Exports all information contained in the Connections view, except passwords for servers and database connections
  • Run Time: Exports information about jobs and schedules contained in the Jobs view
  • User Data: Exports users, roles, and passwords

Author: Doug Ross, Performance Architects 

Introduction

As more organizations move their business intelligence (BI) environments to the cloud, loading and accessing enterprise data will become as important as the front-end visualizations.  Oracle’s BI Cloud Service (BICS) offers several options for those data requirements that go beyond simple data upload. Each has a specific purpose, features, benefits, and limitations. One of the more powerful options for true enterprise-level data transfer to the cloud is Oracle’s Data Sync tool.

Data Sync Overview

Data Sync provides a full-featured data transfer tool with a client interface that allows for scheduling load jobs that efficiently move data from flat files, database tables, and other cloud data sources into the BICS Database Schema Service or Oracle Database Cloud Service.  It can also directly load data as a data set source for the Visual Analyzer projects that are available in BICS.

It includes many of the features found in other data loading tools: logging of load job execution steps, restarting after failures, incremental loading of new or modified data, and configuring the sequence of load operations.  A Data Sync solution would more likely lend itself either to new development of data load processes or to a more agile analytics environment that allowed for changing processes and data models more rapidly than would be possible with an on-premise database.

Data Sync Configuration Steps

Data Sync’s primary function is to upload data into a BICS environment.  Data can be loaded from flat files (CSV or XLSX), relational database sources (either tables, views, or SQL statements that are executed dynamically), Oracle Transactional BI (OTBI), JDBC data sources (e.g., MongoDB, Impala, SalesForce, RedShift), or the Oracle Service Cloud.  Uploaded data can be stored in cloud-based tables or data sets accessible to the data visualization components.

Data Sync can:

  • Load data sources other than Oracle in addition to data files or Oracle tables
  • Execute incremental data loads or rolling deletes and insert / append strategies
  • Merge data from multiple sources
  • Schedule data loads

Data Sync is installed on a local computer running either the Windows or Linux operating systems.  Prior to installing Data Sync, ensure that Java Development Kit (JDK) 1.7 or later is installed on the local computer.  It must be the JDK and not a JRE.  It is also necessary to validate that the user account that will be used to access the BICS database schema has the proper permission.   Work with your cloud administrator to request permission to upload data to Oracle BICS by assigning the BI Data Load Author application role to the account.   To upload data to a data set instead of a table, the BI Advanced Content Author application role should be assigned.

Installation Steps:

  1. Download the Data Sync software from Oracle Technology Network. Currently located at: Data Sync Download on OTN
  2. Unzip BICSDataSync.Zip to a local directory (no spaces in directory name)
  3. Set the JAVA_HOME variable in config.bat or config.sh to point to JDK location
  4. Copy database-specific JDBC drivers to Data Sync’s \lib directory

Data Sync is comprised of both a server component and a client GUI interface.  To start Data Sync and its server component, run datasync.bat (Windows) or datasync.sh (Linux/UNIX) from the Data Sync installation directory. The Data Sync icon displays in the system icon tray to indicate that the server is up and running.

To access the client interface, click on the icon and choose Start UI to open the Data Sync client.

Click on the icon and choose Exit to stop the Data Sync server.

Data Sync Updates

Data Sync automatically checks against the cloud environment to see if a new version is available prior to each load job executing.  It is possible that a new version of Data Sync has been installed in the cloud that is incompatible with the local version.  If the versions match, the data load continues unaffected.  If the minor version of the tool is changed in the cloud, it indicates a patch is available.  This triggers a one-time alert and an optional email if the Data Sync server is configured for emailing notifications.   If the version change indicates a major version release, an alert is created and an email sent. The data load job is stopped and will not run until the new version is installed.

New versions (minor or major) are simple to install by following the standard installation process into a new home directory and then by copying the existing environment configuration into the new installation.

Data Sync Terminology

Connection:      Defines the data sources and target databases.

Project:             A container which describes and organizes the information related to a data load.
There can be any number of projects within a Data Sync instance.

Job:                  The mechanism for uploading all the data sources defined in a project to BICS.

Load Strategies

When moving data to the cloud, a load strategy defines how that data is loaded from the data source into the target. Incremental data loading is available as long as there is a column on the source table which uniquely identifies each row (primary) as well as another column with a “Date/Timestamp” data type that can be used to identify which rows have been added or modified since the previous load.   During the initial load job, the full source table is transmitted to the cloud.  In subsequent loads, the last update date is compared to a stored timestamp for

Load strategy options include:

  • Replace data in table: Truncates existing data and always reloads the table. Any indexes are dropped prior to data load and are recreated after the load
  • Append data to table: New data is added to the table without checking for any prior existence of data.  Indexes are not dropped before the load.  Any new indexes defined on the source are created after the data load
  • Update table (add new records only): Requires a primary key column. If a row with the same key does not exist, then it is inserted, or else the record is ignored
  • Update table (update existing records): Requires a primary key column.  If the data with the same key is available, then it is updated, or else it is ignored

Loading Data Sets

Data sets are separate data objects used primarily by the Visual Analyzer component of BICS.  They can be thought of as separate data files stored in the cloud.   Data Sync can load those data sets with the following guidelines:

  • If a data set by a similar name does not exist, one is created automatically with default settings
  • All string and timestamp based source columns are set to attribute type, and numeric datatype columns are set as measures in the target data set
  • The maximum data set size is 50MB; data uploads fail if the data set exceeds the 50MB limit

Loading Data with Jobs

Data Sync uses the concept of a job to organize, schedule, and execute load processes. A run is an instance of a data loading job. For example, if you run a job twice, then you’ll see two run records on the History tab.  As is common with most scheduling applications, Data Sync allows for job settings to recur on a scheduled basis to meet whatever load frequency is required.

Parameterized Jobs

Parameters are available in Data Sync to customize the data loads at run time to dynamically change the conditions on the data selection step.  Rather than changing the SQL statements used to query the data sources, a parameter can be changed at the project or job level instead. A job level parameter will override a parameter with the same name at the project level.

Chaining Data Sync Load Jobs

There may be occasions where it would be beneficial to the load process to chain load jobs in a specific order. Possible reasons might be the need to load multiple separate data source tables into a common target or the need to load aggregate tables after base level data tables have completed.   The first step would be to separate the load jobs into distinct projects.

When a job starts, a signal file with a name like “job_name”_StartSignal.txt is created in the Data Sync:  log\jobSignal directory for each run of the job. A file with the naming pattern “job_name”_CompletedSignal.txt is created when the job completes successfully, or “job_name”_FailedSignal.txt when the job fails.  Data Sync has a polling mechanism to look for the existence of these files before executing another load job.

By editing the on_demand_job.xml file located in the conf-shared directory, you can specify the name of a file that will trigger a specific load job.

For example:

<TriggerFile job=”Load_HR_Data” file=”d:\Load_The_HR_Data_Now.txt”/>

In the example above, Data Sync polls for the presence of the Load_The_Hr_Data_Now.txt file, and when it is found it triggers the execution of the Load_HR_Data job.  When the job is started, the triggering file is deleted.   A time window for polling can also be specified in the XML file.

<TriggerFile job=”Load_HR_Data” file=”d:\Load_The_HR_Data_Now.txt”>
<TimeWindow startTime=”00:00″ endTime=”05:00″/>
<TimeWindow startTime=”21:00″ endTime=”23:00″/>
</TriggerFile>

Two other options exist for triggering Data Sync load jobs, either from the command line using the datasyncCmdLine.bat/.sh script file or else with polling tables stored in a database.  Both of these methods are described in detail in the Data Sync documentation.

Monitoring and Troubleshooting Data Loads

Data load jobs can be monitored while in progress using the Current Jobs tab or History tab in Data Sync.   The top panel shows the job summary information. The Tasks tab underneath that panel shows one record per user-defined data flow. The tasks and the details show important metrics, including start and end timestamps, the number of rows read and written and throughput.

The run log files are stored in the Data Sync log directory. One directory per run is created with a naming convention of CR_<Table/File Name>_<From Connection>_<To Connection>.<Timestamp>.log.

There are options available within the Connections configuration screens to reset a data target to do a full reload if necessary.  The reload can be for all connections or individual tables within a connection.

Data Sync Configuration Migration

As with most development environments, it may be necessary to maintain multiple instances of Data Sync for DEV, TEST, and PROD.  This can be managed within the Data Sync interface by exporting the metadata from one environment and importing it into the next.  The export and import options are available under the Tools menu.

When exporting, there are options to export the following metadata types:

  • Logical: Exports all information contained in the Project view
  • System: Exports all information contained in the Connections view, except passwords for servers and database connections
  • Run Time: Exports information about jobs and schedules contained in the Jobs view
  • User Data: Exports users, roles, and passwords

Another great day of sessions packed with information on EPM Cloud, JavaScript, DRM, two fantastic case studies by TopDown on Hybrid Aggregation and Handling textual data. For details, check out #Kscope17 on Twitter. With all the details, new features, problem solving, tips & tricks for getting more out of the applications, and then some, the most important piece of information at Kscope17 thus far is on-premise is not gone yet. Oracle EPM v11.1.2.4 will receive additional PSEs to correct issues and add new functionalities. The next major release is 11.2.0. It sounds like it will be a complete makeover. The release is set for early 2019 and support is guaranteed until 2030. Watch this space for more news. In the meantime, Hybrid Cloud should be on everyone’s mind. Over the past several years, industry analysts have consistently stated that the future looks decidedly cloud-based. The fact is that because of the benefits inherent in the cloud platform, the business world is moving at an increasingly faster pace. While businesses wait for some of the EPM Cloud applications to mature, they can take advantage of the flexibility of choice: use what best fits your needs.

To find out more about Hybrid Cloud, grab a cup of coffee and join TopDown’s Paul Hoch and Steve Davis tomorrow (Wednesday) at 8:30 am in the Peony room. Details below.

Up Next

TopDown closes Kscope17 sessions with four of our own covering Hybrid Cloud, Agile Scrum, Calc Manager, and Essbase. Enough to make your head spin.

8:30 a.m. – 9:30 a.m.

Hybrid Cloud – A Solution to a Business Problem

Using or considering using the Hybrid Approach? Join Paul Hoch and Steve Davis tomorrow morning to answer a typical question companies ask themselves about the Hybrid Cloud: “What is the right the right approach and how do I make sure all my applications are seamlessly integrated?”

11:15 a.m. – 12:15 p.m.

Agile/Scrum, A New Way to Manage Your Project Development and Delivery

Do you know about Scrum and Agile Software Development? At 11:15 a.m. this morning, TopDown’s Michael Smith will be describing TopDown’s challenges with Scrum and how we overcame them as well as TopDown’s success in using Scrum.

1:45 p.m. – 2:45 p.m.

Extreme Essbase Calcs: What-if, Goal Seeking and Sensitivity Analysis

Geek Alert: Ron Moore and Ludovic De Paz are at it again. This time they’re talking what-if calculations, goal seeking and sensitivity analysis in Essbase.

4:15 p.m. – 5:15 p.m.

How to Get the Most Out of Essbase in the Cloud

Wondering how to get the most out of Essbase in the cloud? Join Paul Hoch today at 4:15 p.m. as he closes out the Kscope17 sessions by covering how Essbase in the Cloud can replace, enhance or serve the same role as your current on-premise Essbase environment.

 

#whatkeepsyouupatnight

We’re starting to see some trends on our whiteboard. Stop by on one of the breaks and put a check mark next to your challenge if you find it and add it if you don’t. And in between follow us on Twitter for tidbits about what’s happening at the conference: interesting thoughts from sessions, EPM challenges from your friends and colleagues, and more.

Check back tomorrow for a Kscope17 wrap.

 

 

The post Kscope17 – Day 3 (Time to Consider Hybrid Cloud?) appeared first on TopDown Consulting Blog.

Kscope17 – Day 3 (Time to Consider Hybrid Cloud?) - 27-Jun-2017 21:53 - TopDown Consulting

Another great day of sessions packed with information on EPM Cloud, JavaScript, DRM, two fantastic case studies by TopDown on Hybrid Aggregation and Handling textual data. For details, check out #Kscope17 on Twitter. With all the details, new features, problem solving, tips & tricks for getting more out of the applications, and then some, the most important piece of information at Kscope17 thus far is on-premise is not gone yet. Oracle EPM v11.1.2.4 will receive additional PSEs to correct issues and add new functionalities. The next major release is 11.2.0. It sounds like it will be a complete makeover. The release is set for early 2019 and support is guaranteed until 2030. Watch this space for more news. In the meantime, Hybrid Cloud should be on everyone’s mind. Over the past several years, industry analysts have consistently stated that the future looks decidedly cloud-based. The fact is that because of the benefits inherent in the cloud platform, the business world is moving at an increasingly faster pace. While businesses wait for some of the EPM Cloud applications to mature, they can take advantage of the flexibility of choice: use what best fits your needs.

To find out more about Hybrid Cloud, grab a cup of coffee and join TopDown’s Paul Hoch and Steve Davis tomorrow (Wednesday) at 8:30 am in the Peony room. Details below.

Up Next

TopDown closes Kscope17 sessions with four of our own covering Hybrid Cloud, Agile Scrum, Calc Manager, and Essbase. Enough to make your head spin.

8:30 a.m. – 9:30 a.m.

Hybrid Cloud – A Solution to a Business Problem

Using or considering using the Hybrid Approach? Join Paul Hoch and Steve Davis tomorrow morning to answer a typical question companies ask themselves about the Hybrid Cloud: “What is the right the right approach and how do I make sure all my applications are seamlessly integrated?”

11:15 a.m. – 12:15 p.m.

Agile/Scrum, A New Way to Manage Your Project Development and Delivery

Do you know about Scrum and Agile Software Development? At 11:15 a.m. this morning, TopDown’s Michael Smith will be describing TopDown’s challenges with Scrum and how we overcame them as well as TopDown’s success in using Scrum.

1:45 p.m. – 2:45 p.m.

Extreme Essbase Calcs: What-if, Goal Seeking and Sensitivity Analysis

Geek Alert: Ron Moore and Ludovic De Paz are at it again. This time they’re talking what-if calculations, goal seeking and sensitivity analysis in Essbase.

4:15 p.m. – 5:15 p.m.

How to Get the Most Out of Essbase in the Cloud

Wondering how to get the most out of Essbase in the cloud? Join Paul Hoch today at 4:15 p.m. as he closes out the Kscope17 sessions by covering how Essbase in the Cloud can replace, enhance or serve the same role as your current on-premise Essbase environment.

 

#whatkeepsyouupatnight

We’re starting to see some trends on our whiteboard. Stop by on one of the breaks and put a check mark next to your challenge if you find it and add it if you don’t. And in between follow us on Twitter for tidbits about what’s happening at the conference: interesting thoughts from sessions, EPM challenges from your friends and colleagues, and more.

Check back tomorrow for a Kscope17 wrap.

 

 

The post Kscope17 – Day 3 (Time to Consider Hybrid Cloud?) appeared first on TopDown Consulting Blog.

When loading data into Oracle EPM applications the tool of choice is Financial Data Quality Managment Enterprise Edition (FDMEE). Dating back to Classic FDM it has been possible to load either by single period or multi-period.

In Classic FDM there was a performance penalty associated with processing multi-period files. The issue with Classic FDM was that behind the scenes the application would split a mutli-period file into single period files. This added extra overhead into the process and depending on the size of the files could lead to significantly slower processing. This is probably due to the underlying technology being a mix of Java, .NET and Classic Visual Basic (VB).

Classic FDM Batch Data Flow Diagram

With FDMEE much of the performance penalty has been mitigated, it probably helps that the marshaling of data between Java, .NET and VB have been removed. In FDMEE everything runs through the Java Virtual Machine (JVM). While VB is still an option most customers are now making the switch to Jython which runs natively in the JVM. There is still however a problem with mutli-period files, the format is quite rigid and requires that import formats dictate the periods that are to be processed. For many customers this is an issue since we rarely load whole years of data at a time. This leaves users with two options. First they can create a seperate location and import format for each format (Jan-Dec vs Feb-Dec or Jan-Mar). Option two is to split the file into multiple files for processing via the single period. Many clients opt for option two because it eases maintenance of mapping and makes it easier for end users.

FDMEE Multi-Period Load Flow Diagram

The second option is not without its challenges in that the file must be parsed and files must be named appropriately when using batch. When using batch processing files are processed in alphabetical order using a particular naming convention [sort]~[rule name]~[month][year]~RR.txt (ex. a~load rule~Mar2017~RR.txt.) While its not necessary to use the sort parameter it allows the user to determine the order in which files are processed. Without the sort files would be processed in the following order April, August, February, March, etc.

On a past assignment the client was using a series of ODI interfaces and variables in a loop to process the files. While effective this was not very portable, reusable and was prone to error. At my insistence they eventually refactored this process using Jython (this also yielded a small but noticeable performance improvement.) Still, I felt the implementation was not ideal. First, they were making multiple passes through the file in code one for each period. Second, they were not using proper sorting to ensure that files were processed chronologically. Me being me and having time since we had frozen changes to DRM at the time, I took the opportunity to re-write the process.

First, I setup some variables to set the expected delimiter and file name.

delimiter = "|"
data_load_rule = 'APP_OTLK_Prod'

Next, I wanted to open the file in read mode and determine the dimensions and number of periods. This is done by reading line one for the dimensions and splitting using the delimiter. Reading the periods was a bit trickier since the months were quoted. I used a substitution regular expression to remove the quotes and then stripped the line to remove the trailing newline character.

f = open(data_load_rule + ".txt", 'r+')
dimension = f.readline().split(delimiter) # first line contains dimensions
month = re.sub('"', '', f.readline()).strip().split(delimiter) # second line contains months

File Format:

"Version"|"Organization"|"Scenario"|"Currency"|"Years"|"Measure"|"Period"
"Jan"|"Feb"|"Mar"|"Apr"|"May"|"Jun"|"Jul"|"Aug"|"Sep"|"Oct"|"Nov"|"Dec"
"Final"|"ORG1"|"Actual"|"USD"|"FY17"|"A10000"|10|20|30|40|50|60|70|80|90|100|110|120

Then I used the position of the Years dimension to determine the year in the first line of data, since it was a 4-digit year I only wanted the last two characters (16 from 2016). Since the values in the files we received we quoted I had to use "Years" vs Years.

line = f.readline().strip()
year = line.split(delimiter)[dimension.index('"Years"')].strip('"')[2:]

At this point, I had the information I needed to determine the number of files I needed to create, so I created a array of file pointers. To allow for easier future use I created a method to perform this task. The method takes in three parameters: the name of the rule, the array of months and the year. Then in the method I created an array to hold the file pointers and then looped through the months appending file pointers to the array. I used the index of the month in the array to determine the sort. This was done by taking the ordinal value of a and adding the index of the month to it. Then I specified that the files should be opened in write mode before returning the array of file pointers.

def get_files(rule, months, year):
files = []
for month in months:
    files.append(open("a%(index)s~%(rule)s~%(month)s20%(year)s~RR.txt" %
                      {'index': chr(ord('a') + months.index(month)), 'rule': rule,
                       'month': month, 'year': year}, 'w'))
return files

Next I created a while loop that ends when the line received blank, making sure to strip newline characters and spaces from beginning and end of lines. Finally, I closed the file pointers to release the memory.

while True:
if line == '':
    break
...
line = f.readline().strip()
f.close()
for f in files:
    f.close()

Then I needed to process the individual lines of the file in to an array of lines that could easily be written to my array of file pointers. To do this I created another method. This method takes an array, the array is generated by splitting the current line with the delimiter (pipe | in this instance). The method also takes a positive integer value indicating the number of columns (minus the period column). Last the method uses the array of months. Then I created an array to hold the lines I was about to create in the out variable. Next using the join command I concatenated the dimensions on the line back together to form the dims variable. After I looped through the available months and added items to the out array based on the index of the month being processed. I also checked to make sure that the data value for a particular month was not blank (this ended up making the resulting output files smaller than the previous implementation, which lead to an additional performance gain.)

def create_lines(arr, columns, months, delim='|'):
    out = []
    dims = delim.join(arr[0:columns])
    for i in range(0, len(months)):
        if len(arr) > columns + i:
            if arr[columns + i]:
                out.append("%(dims)s%(delim)s%(month)s%(delim)s%(value)s\n" % \
                       {'dims': dims, 'month': months[i], 'value': arr[columns + i], 'delim': delim})
            else:
                out.append("")
    return out

Last I modified the while statement section to output the lines to their respective files. This was done by looping through the lines in the array and using the index respective to the month.

while True:
    if line == '':
        break
    lines = create_lines(line.split(delimiter), len(dimension) - 1, month, delimiter)
    for i in range(0, len(lines)):
        files[i].write(lines[i])
    line = f.readline().strip()
f.close()
for f in files:
    f.close()

Below I have included the full script. Please feel free to use all or part of it in your implementation. Cheers.

When loading data into Oracle EPM applications the tool of choice is Financial Data Quality Management Enterprise Edition (FDMEE). Dating back to Classic FDM it has been possible to load either by single period or multi-period.

In Classic FDM there was a performance penalty associated with processing multi-period files. The issue with Classic FDM was that behind the scenes the application would split a multi-period file into single period files. This added extra overhead into the process and depending on the size of the files could lead to significantly slower processing. This is probably due to the underlying technology being a mix of Java, .NET and Classic Visual Basic (VB).

Classic FDM Batch Data Flow Diagram

With FDMEE much of the performance penalty has been mitigated, it probably helps that the marshaling of data between Java, .NET and VB have been removed. In FDMEE everything runs through the Java Virtual Machine (JVM). While VB is still an option most customers are now making the switch to Jython which runs natively on the JVM. There is still, however, a problem with multi-period files, the format is quite rigid and requires that import formats dictate the periods that are to be processed. For many customers, this is an issue since we rarely load whole years of data at a time. This leaves users with two options. First, they can create a separate location and import format for each format (Jan-Dec vs Feb-Dec or Jan-Mar). Option two is to split the file into multiple files for processing via the single period. Many clients opt for option two because it eases maintenance of mapping and makes it easier for end users.

FDMEE Multi-Period Load Flow Diagram

The second option is not without its challenges in that the file must be parsed and files must be named appropriately when using batch processing. When using batch processing files are processed in alphabetical order using a particular naming convention [sort]~[rule name]~[month][year]~RR.txt (ex. a~load rule~Mar2017~RR.txt.) While it's not necessary to use the sort parameter it allows the user to determine the order in which files are processed. Without the sort, files would be processed in the following order April, August, February, March, etc.

On a past assignment, the client was using a series of ODI interfaces and variables in a loop to process the files. While effective this was not very portable, reusable and was prone to error. At my insistence, they eventually refactored this process using Jython (this also yielded a small but noticeable performance improvement.) Still, I felt the implementation was not ideal. First, they were making multiple passes through the file in code one for each period. Second, they were not using proper sorting to ensure that files were processed chronologically. Me being me and having time since we had frozen changes to DRM at the time, I took the opportunity to re-write the process.

First, I set up some variables to set the expected delimiter and file name.

delimiter = "|"
data_load_rule = 'APP_OTLK_Prod'

Next, I wanted to open the file in read mode and determine the dimensions and number of periods. This is done by reading line one for the dimensions and splitting using the delimiter. Reading the periods was a bit trickier since the months were quoted. I used a substitution regular expression to remove the quotes and then stripped the line to remove the trailing newline character.

f = open(data_load_rule + ".txt", 'r+')
dimension = f.readline().split(delimiter) # first line contains dimensions
month = re.sub('"', '', f.readline()).strip().split(delimiter) # second line contains months

File Format:

"Version"|"Organization"|"Scenario"|"Currency"|"Years"|"Measure"|"Period"
"Jan"|"Feb"|"Mar"|"Apr"|"May"|"Jun"|"Jul"|"Aug"|"Sep"|"Oct"|"Nov"|"Dec"
"Final"|"ORG1"|"Actual"|"USD"|"FY17"|"A10000"|10|20|30|40|50|60|70|80|90|100|110|120

Then I used the position of the Years dimension to determine the year in the first line of data, since it was a 4-digit year I only wanted the last two characters (16 from 2016). Since the values in the files we received were quoted I had to use "Years" vs Years.

line = f.readline().strip()
year = line.split(delimiter)[dimension.index('"Years"')].strip('"')[2:]

At this point, I had the information I needed to determine the number of files I needed to create, so I created a array of file pointers. To allow for easier future use I created a method to perform this task. The method takes in three parameters: the name of the rule, an array of months and the year. Then in the method, I created an array to hold the file pointers and then looped through the months appending file pointers to the array. I used the index of the month in the array to determine the sort. This was done by taking the ordinal value of a and adding the index of the month to it. Then I specified that the files should be opened in write mode before returning the array of file pointers.

def get_files(rule, months, year):
files = []
for month in months:
    files.append(open("a%(index)s~%(rule)s~%(month)s20%(year)s~RR.txt" %
                      {'index': chr(ord('a') + months.index(month)), 'rule': rule,
                       'month': month, 'year': year}, 'w'))
return files

Next, I created a while loop that ends when a line received blank, making sure to strip newline characters and spaces from beginning and end of lines. Finally, I closed the file pointers to release the memory.

while True:
if line == '':
    break
...
line = f.readline().strip()
f.close()
for f in files:
    f.close()

Then I needed to process the individual lines of the file into an array of lines that could easily be written to my array of file pointers. To do this I created another method. This method takes an array, the array is generated by splitting the current line with the delimiter (pipe | in this instance). The method also takes a positive integer value indicating the number of columns (minus the period column). Last the method uses the array of months. Then I created an array to hold the lines I was about to create in the out variable. Next using the join command I concatenated the dimensions on the line back together to form the dims variable. After I looped through the available months and added items to the out array based on the index of the month being processed. I also checked to make sure that the data value for a particular month was not blank (this ended up making the resulting output files smaller than the previous implementation, which lead to an additional performance gain.)

def create_lines(arr, columns, months, delim='|'):
    out = []
    dims = delim.join(arr[0:columns])
    for i in range(0, len(months)):
        if len(arr) > columns + i:
            if arr[columns + i]:
                out.append("%(dims)s%(delim)s%(month)s%(delim)s%(value)s\n" % \
                       {'dims': dims, 'month': months[i], 'value': arr[columns + i], 'delim': delim})
            else:
                out.append("")
    return out

I modified the while statement section to output the lines to their respective files. This was done by looping through the lines in the array and using the index respective to the month.

while True:
    if line == '':
        break
    lines = create_lines(line.split(delimiter), len(dimension) - 1, month, delimiter)
    for i in range(0, len(lines)):
        files[i].write(lines[i])
    line = f.readline().strip()
f.close()
for f in files:
    f.close()

Below I have included the full script. Please feel free to use all or part of it in your implementation. Cheers.

Kscope 17, day 1, part 2 - 27-Jun-2017 13:13 - Cameron Lackpour
OMG, I’m already exhausted
I’m not sure anyone could tell the difference (or indeed, care) that after the first full day, I am already at 3 ½ hours of sleep.  It isn’t going to get better.  And yet, Gentle Reader, here is Yr. Obt., Fthl., & Most Hmbl. Svt. tip tapping away with my usual drivel.  Into every life some rain must fall on your (not mine) head.

So, Cameron, what did you do yesterday evening?

If I didn’t have Twitter, I’d never know, using the theory of “Pictures or it didn’t happen”.  Hoo-boy, it was busy.

Check out yesterday’s post for detail up the evening community entertainment.  I am semi-ashamed to admit that I went to the BI, not the EPM, event.  Why, Cameron, why?  ‘Cos I got to see this:

This isn’t the last surprising Gabby snap.  Read on…

Embarrassing photos aside, the event was fantastic and were just pipped to 1st place (it was a pub quiz with a Texas flavor) by those !@#$ing cowboys.  Bastards.

Cameron and Natalie’s EPM meetup

This is the 10th anniversary of my bestie Natalie Delamar.  ODTUG Madame President went all out as usual with a party.  It was amazing.  Thank you, Ride or Die Girl.  It’s been quite a trip, and one I wouldn’t have missed for anything.

Here’s a picture of the layout.  Amazing, Natalie.  I can’t figure out how you have time to do all that you do.

It was a hopping party.

Dan Keith, Amy Dailey, Tim German, Celvin Kattkooran’s forehead (I am not going to pursue a career as a photographer), and Tim German.

Here I am with my buds Amy Dailey and Natalie.

The real surprise.

Who knew that Gabby Rubin moonlights as a barman?  The drinks he made were amazing.  Incredble.  Awesome.  Thank you so much, Gabby.

All for now

I’ll try, maybe, to get today’s events in.  Follow my Twitter feed for up to date information.

Be seeing you.
OMG, I’m already exhausted
I’m not sure anyone could tell the difference (or indeed, care) that after the first full day, I am already at 3 ½ hours of sleep.  It isn’t going to get better.  And yet, Gentle Reader, here is Yr. Obt., Fthl., & Most Hmbl. Svt. tip tapping away with my usual drivel.  Into every life some rain must fall on your (not mine) head.

So, Cameron, what did you do yesterday evening?

If I didn’t have Twitter, I’d never know, using the theory of “Pictures or it didn’t happen”.  Hoo-boy, it was busy.

Check out yesterday’s post for detail up the evening community entertainment.  I am semi-ashamed to admit that I went to the BI, not the EPM, event.  Why, Cameron, why?  ‘Cos I got to see this:

This isn’t the last surprising Gabby snap.  Read on…

Embarrassing photos aside, the event was fantastic and were just pipped to 1st place (it was a pub quiz with a Texas flavor) by those !@#$ing cowboys.  Bastards.

Cameron and Natalie’s EPM meetup

This is the 10th anniversary of my bestie Natalie Delamar.  ODTUG Madame President went all out as usual with a party.  It was amazing.  Thank you, Ride or Die Girl.  It’s been quite a trip, and one I wouldn’t have missed for anything.

Here’s a picture of the layout.  Amazing, Natalie.  I can’t figure out how you have time to do all that you do.

It was a hopping party.

Dan Keith, Amy Dailey, Tim German, Celvin Kattkooran’s forehead (I am not going to pursue a career as a photographer), and Tim German.

Here I am with my buds Amy Dailey and Natalie.

The real surprise.

Who knew that Gabby Rubin moonlights as a barman?  The drinks he made were amazing.  Incredble.  Awesome.  Thank you so much, Gabby.

All for now

I’ll try, maybe, to get today’s events in.  Follow my Twitter feed for up to date information.

Be seeing you.

Fun times out here in San Antonio, if you haven't been you definitely want to check out Kscope next year in Orlando. I haven't let the family know yet but something tells me the whole crew will be there.

So things we learned yesterday:

  • Big data has some really cool tools, Kafka looks pretty neat.
  • Groovy makes me super happy in that we now have a real language inside of Essbase. Also, there are options to get groovy out of the sandbox and let you access system resources. Thanks, Celvin.
  • Cloud, Cloud, Cloud and oh ya we have a new road map for on-premise but its still a ways off. It was a packed house (I couldn't get in but I hear there were some not so happy customers).
  • Some of the vendors have some really cool stuff.
    • Incorta looks amazing, is wicked fast and should muddy the BI waters a bit more.
    • Viscosity has some really cool hardware if you are on-prem check them out.
    • Dodeca continues to impress.
  • Not all the sessions have been great for one reason or another but we have to do the surveys honestly so that the ODTUG board can work to make better next year.

Some not so Oracle things I learned:

  • I need to drink more water
  • Sleep is overrated
  • If you find a chair take it
  • It's been a lot of fun seeing old faces and I've enjoyed meeting many new faces.
  • No one expected the beard
Kscope 17 Recap - Day 1 - 27-Jun-2017 10:20 - Keith Kikta

Fun times out here in San Antonio, if you haven't been you definitely want to check out Kscope next year in Orlando. I haven't let the family know yet but something tells me the whole crew will be there.

So things we learned yesterday:

  • Big data has some really cool tools, Kafka looks pretty neat.
  • Groovy makes me super happy in that we now have a real language inside of Essbase. Also, there are options to get groovy out of the sandbox and let you access system resources. Thanks, Celvin.
  • Cloud, Cloud, Cloud and oh ya we have a new road map for on-premise but its still a ways off. It was a packed house (I couldn't get in but I hear there were some not so happy customers).
  • Some of the vendors have some really cool stuff.
    • Incorta looks amazing, is wicked fast and should muddy the BI waters a bit more.
    • Viscosity has some really cool hardware if you are on-prem check them out.
    • Dodeca continues to impress.
  • Not all the sessions have been great for one reason or another but we have to do the surveys honestly so that the ODTUG board can work to make better next year.

Some not so Oracle things I learned:

  • I need to drink more water
  • Sleep is overrated
  • If you find a chair take it
  • It's been a lot of fun seeing old faces and I've enjoyed meeting many new faces.
  • No one expected the beard
Next