Business Objects 4 Sdk

Since BO4 the sdk’s have drastically changed. Lots of methods, I used for a BO3 project, just throw a “NotImplementedException” after I switched to the BO4 libraries.

Apparently, the chosen way now is to use the RESTful WebServices API. The handling is totally different, but after working with it a day, I have to state that this is pretty much straightforward – in contrary to the “old” apis.

First you need to have REST client, which handles the http messages. Using this client you simply call certain urls (either GET or POST) using a logon token. The logon token is obtained by calling a login url. The http GETs return a xml structure containing the data, which needs to be parsed (xpath).

Here are some examples, which I created today:

The logon, which obtains the logon token

public void logon(String username, String password) throws Exception {
		logger.debug("Will logon now");
		if (this.logonToken != null)
			throw new IllegalArgumentException("Please logoff first");
		
		
		Request request = new Request();
		request.send(baseUrl + "/logon/long", "GET", null);
		
		// Sets logon information
        Map<String, String> map = new HashMap<String, String>();
        map.put("//attr[@name='userName']", username);
        map.put("//attr[@name='password']", password);
        map.put("//attr[@name='auth']", "secEnterprise");
		String filledLogonResponse = commonUtils.fillXml(request.getResponseContent(), map);
		
		String trace = request.send(baseUrl + "/logon/long", "POST", filledLogonResponse);
		
		this.logonToken = request.getResponseHeaders().get("X-SAP-LogonToken").get(0);
		System.out.print(logonToken);
	}

don’t forget to Logout

	public void logoff() throws Exception {
		logger.debug("Will logoff now");
		
		Request request = new Request(logonToken);
        String trace = request.send(baseUrl + "/logoff", "POST", null);	
        
	}

list documents

	public Document listDocuments(int limit, int offset) throws Exception {
		logger.debug("Will fetch document list");
		if (this.logonToken == null)
			throw new IllegalArgumentException("Please logon first");
		
		Request request = new Request(this.logonToken);
		String trace = request.send(baseUrl + "raylight/v1/documents?limit=1", "GET", null);
		logger.trace(trace);
		if (request.getResponseCode() != 200)
			throw new RestException(String.format("ReturnCode: %s, Message %s", request.getResponseCode(), request.getResponseContent()));
		
		return commonUtils.getXmlDocument(request.getResponseContent());
	}

get document properties

public Document getDocProperties(int documentId) throws Exception {
		logger.debug("Will fetch document properties");
		if (this.logonToken == null)
			throw new IllegalArgumentException("Please logon first");
		
		Request request = new Request(this.logonToken);
		String trace = request.send(baseUrl + String.format("raylight/v1/documents/%s/properties", documentId), "GET", null);
		logger.trace(trace);
		if (request.getResponseCode() != 200)
			throw new RestException(String.format("ReturnCode: %s, Message %s", request.getResponseCode(), request.getResponseContent()));
		
		return commonUtils.getXmlDocument(request.getResponseContent());
	    
	}

get document variables

	public Document getVariables(int documentId) throws Exception {
		logger.debug("Will fetch document variables");
		if (this.logonToken == null)
			throw new IllegalArgumentException("Please logon first");
		
		Request request = new Request(this.logonToken);
		String trace = request.send(baseUrl + String.format("raylight/v1/documents/%s/variables", documentId), "GET", null);
		logger.trace(trace);
		if (request.getResponseCode() != 200)
			throw new RestException(String.format("ReturnCode: %s, Message %s", request.getResponseCode(), request.getResponseContent()));
		
		return commonUtils.getXmlDocument(request.getResponseContent());
	    
	}

get dataprovider

public Document getDataProviderDetails(int documentId, String dataproviderId) throws Exception {
		logger.debug(String.format("Will fetch dataprovider details for document %s, dataprovider %s", documentId, dataproviderId));
		if (this.logonToken == null)
			throw new IllegalArgumentException("Please logon first");
		
		Request request = new Request(this.logonToken);
		String trace = request.send(baseUrl + String.format("raylight/v1/documents/%s/dataproviders/%s", documentId, dataproviderId), "GET", null);
		logger.trace(trace);
		if (request.getResponseCode() != 200)
			throw new RestException(String.format("ReturnCode: %s, Message %s", request.getResponseCode(), request.getResponseContent()));
		
		return commonUtils.getXmlDocument(request.getResponseContent());
	    
	}

get report pages (called reports)

public Document getDocumentReports(Integer documentId) throws Exception {
		logger.debug("Will fetch document structure");
		if (this.logonToken == null)
			throw new IllegalArgumentException("Please logon first");
		
		Request request = new Request(this.logonToken);
		String trace = request.send(baseUrl + String.format("raylight/v1/documents/%s/reports", documentId), "GET", null);
		logger.trace(trace);
		if (request.getResponseCode() != 200)
			throw new RestException(String.format("ReturnCode: %s, Message %s", request.getResponseCode(), request.getResponseContent()));
		
		return commonUtils.getXmlDocument(request.getResponseContent());
	  
	}

get the report map

public Document getReportMap(Integer documentId, String reportId) throws Exception {
		logger.debug("Will fetch document map");
		if (this.logonToken == null)
			throw new IllegalArgumentException("Please logon first");
		
		Request request = new Request(this.logonToken);
		String trace = request.send(baseUrl + String.format("raylight/v1/documents/%s/reports/%s/map", documentId, reportId), "GET", null);
		logger.trace(trace);
		if (request.getResponseCode() != 200)
			throw new RestException(String.format("ReturnCode: %s, Message %s", request.getResponseCode(), request.getResponseContent()));
		
		return commonUtils.getXmlDocument(request.getResponseContent());
	  
	}

 

a Storm test

Here are my first steps for real time etl using storm. It’s only a rough skelleton for now, but it works.

The Spout (will read the source)

package tki.bigdata.test;

import java.util.Map;
import java.util.Random;

import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.testing.TestWordSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class OracleSourceSpout extends BaseRichSpout {
	public static Logger LOG = LoggerFactory.getLogger(OracleSourceSpout.class);
    boolean _isDistributed;
    SpoutOutputCollector _collector;
    
	@Override
	public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
		this._collector = collector;
		
	}

	@Override
	public void nextTuple() {
		Utils.sleep(1000);        
        _collector.emit(new Values("here comes a new record"));		
	}

	@Override
	public void declareOutputFields(OutputFieldsDeclarer declarer) {
		declarer.declare(new Fields("record"));
		
	}

}

And the Bolt (will save the data, actually there could be more bolts in row and/or parallel, of course.

package tki.bigdata.test;

import java.util.Map;

import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;

public class ProcessDataBolt extends BaseRichBolt {
    OutputCollector _collector;

    @Override
    public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
      _collector = collector;
    }

    @Override
    public void execute(Tuple tuple) {
      _collector.emit(tuple, new Values(tuple.getString(0)));
      System.out.println("I will save tuple now " + tuple);
      _collector.ack(tuple);
    }

    @Override
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
      declarer.declare(new Fields("record"));
    }


  }

This runs the topology:

package tki.bigdata.test;

import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.starter.WordCountTopology.SplitSentence;
import org.apache.storm.starter.WordCountTopology.WordCount;
import org.apache.storm.starter.spout.RandomSentenceSpout;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;

public class MyTopology {

	public static void main(String args[]) throws AlreadyAliveException, InvalidTopologyException, AuthorizationException, InterruptedException {
		TopologyBuilder builder = new TopologyBuilder();

	    builder.setSpout("spout", new OracleSourceSpout(), 1);

	    builder.setBolt("bolt", new ProcessDataBolt(), 1).shuffleGrouping("spout");
	    //builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

	    Config conf = new Config();
	    conf.setDebug(true);

	    if (args != null && args.length > 0) {
	      conf.setNumWorkers(3);

	      StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology());
	    }
	    else {
	      conf.setMaxTaskParallelism(3);

	      LocalCluster cluster = new LocalCluster();
	      cluster.submitTopology("OracleTopology", conf, builder.createTopology());

	      Thread.sleep(10000);

	      cluster.shutdown();
	    }
	}
}

 

Zookeeper / Kafka on the Mac

The following protocol shows, how to setup zookeeper and kafka on a mac.

First, I download the software and unzip it to new folder ~/opt

Tobiass-Air:~ tk$ mv ~/Downloads/zookeeper-3.4.8.tar.gz ~/opt/
Tobiass-Air:~ tk$ mv ~/Downloads/kafka
kafka-0.10.0.0-src.tgz   kafka_2.11-0.10.0.0.tgz  
Tobiass-Air:~ tk$ mv ~/Downloads/kafka
kafka-0.10.0.0-src.tgz   kafka_2.11-0.10.0.0.tgz  
Tobiass-Air:~ tk$ mv ~/Downloads/kafka_2.11-0.10.0.0.tgz ~/opt/
Tobiass-Air:~ tk$ cd ~/opt/
Tobiass-Air:opt tk$ tar -zxf zookeeper-3.4.8.tar.gz 
Tobiass-Air:opt tk$ tar -zxf kafka_2.11-0.10.0.0.tgz 

I am doing a basic configuration and some checks:

I simply copy the zoo_sample.cfg and to zoo.cfg and change only the entry dataDir=~/opt/temp

Tobiass-Air:~ tk$ cd ~/opt/zookeeper-3.4.8
Tobiass-Air:zookeeper-3.4.8 tk$ cd conf/
Tobiass-Air:conf tk$ ls
configuration.xsl	log4j.properties	zoo_sample.cfg
Tobiass-Air:conf tk$ cp zoo_sample.cfg zoo.cfg
Tobiass-Air:conf tk$ vi zoo.cfg

Starting zookeeper

Tobiass-Air:zookeeper-3.4.8 tk$ ZooKeeper JMX enabled by default
Using config: /Users/tk/opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

[1]+  Done                    bin/zkServer.sh start

Stopping zookeeper

Tobiass-Air:zookeeper-3.4.8 tk$ bin/zkServer.sh stop
ZooKeeper JMX enabled by default
Using config: /Users/tk/opt/zookeeper-3.4.8/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED

Starting Kafka (when zookeeper is running)

Tobiass-Air:opt bin/kafka-server-start.sh config/server.properties &

Stopping Kafka

bin/kafka-server-stop.sh config/server.properties

create a topic

Tobiass-Air:$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic mytopic
Created topic "mytopic".

list topics:

Tobiass-Air:$ bin/kafka-topics.sh --list --zookeeper localhost:2181
mytopic

start a producer in one terminal…

Tobiass-Air:kafka_2.11-0.10.0.0 tk$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic mytopic

.. and start a consumer in another terminal

Tobiass-Air:kafka_2.11-0.10.0.0 tk$ bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic mytopic

from this point, every input to the producer-terminal will be printed to the consumer terminal as well.

This was a very basic test. Next time let’s create a more intriguing szenario ūüėČ

First steps with cloudera and spark

Importing Data from Sql Server and Oracle into hdfs

Sqoop will import the table and create a *.avro file as well. This *.avro file will be created in the folder sqoop was called from.

sqoop import \
  -m 1 \
 --connect jdbc:sqlserver://Arwen:1433 \
 --username=bods \
 --password=**** \
 --table datamart.dbo.fct_txn \
 --compression-codec=snappy \
 --as-avrodatafile \
 --warehouse-dir=/user/tkidb
sqoop import \
  -m 1 \
 --connect jdbc:oracle:thin:tki/diplom@//arwen:1521/orcl \
 --username=tki \
 --password=**** \
 --table KONTOAUSZUG
 --compression-codec=snappy \
 --as-avrodatafile \
 --warehouse-dir=/user/tkidb

Upload the created *.avsc file (here “tkidb/sqoop_import_KONTOAUSZUG.avsc”) to the hdfs as well. We need this to specify the hive table.

looking at files on hadoop file system

[cloudera@quickstart ~]$ hadoop fs -ls /user/tkidb
Found 3 items
drwxr-xr-x   - cloudera supergroup          0 2016-06-26 01:58 /user/tkidb/KONTOAUSZUG
drwxrwxrwx   - hive     supergroup          0 2016-06-26 02:14 /user/tkidb/kontoauszug
-rw-r--r--   1 cloudera supergroup       1407 2016-06-26 02:13 /user/tkidb/sqoop_import_KONTOAUSZUG.avsc

Create a hive table from this file:

CREATE EXTERNAL TABLE kontoauszug
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 'hdfs:///user/tkidb/KONTOAUSZUG'
TBLPROPERTIES ('avro.schema.url'='hdfs://quickstart.cloudera/user/tkidb/sqoop_import_KONTOAUSZUG.avsc');

After invalidate metadata  in Impala and refresh, the table can be queried.

Assessing file from spark shell

another test, using spark…

scala> val f = sc.textFile("hdfs://quickstart.cloudera:8020/user/hive/warehouse/fints_segmente.csv")
....
2016-06-25 11:45:40,977 INFO  [main] spark.SparkContext (Logging.scala:logInfo(59)) - Created broadcast 3 from textFile at <console>:12
f: org.apache.spark.rdd.RDD[String] = hdfs://quickstart.cloudera:8020/user/hive/warehouse/fints_segmente.csv MappedRDD[7] at textFile at <console>:12

scala> f.count
....
res3: Long = 7783

 

 

WDMyCloud: Git einrichten

So habe ich GIT einrichten können

(der Vorgang muss wiederholt werden, wenn WDMyCloud gekillt wurde)

SSH f√ľr den gew√ľnschten User einrichten

Einloggen per ssh als root

ssh root@wdmycloud

sshd_config editieren

WDMyCloud:~# vi /etc/ssh/sshd_config
(user muss bei AllowUsers aufgef√ľhrt sein)

ssh neu starten

WDMyCloud:~# /etc/init.d/ssh restart

GIT installieren

sudo aptitude update
sudo aptitude install git

creating graph visualizations using graphviz

Today I want to recommend the tool graphviz, with which I was able in very short time to visualize a dependency graph of my current projects processes.

There are certainly more sophisticated libraries for java or c#, but as time is always short for documentation, graphiz turned out to be the optimal solution for me.

The result after one hour is this nice, overwhelming picture:

dependencies

 

 

 

SAP BO: Auflösung von Schleifen mit Kontexten im Information Design Tool

Anders als im alten Universe Designer, werden Kontexte nicht mehr direkt im Universum erzeugt. Im neuen Information Design Tool h√§ngen die Kontexte stattdessen in der “Datengrundlage” (.dfx)

Das Bild zeigt eine kleine Datengrundlage, zu sehen sind hier 2 Faktentabellen und diverse Dimensionen, wovon einige mit beiden Faktentabellen verkn√ľpft sind.

idt_loops_1

Auf diese Weise sind Schleifen entstanden. Will man Transaktionen (FCT_TXN) zusammen mit der Zeit abfragen, gibt es z. B. rein technisch gesehen zwei Alternativen, das SQL aufzubauen: FCT_TXN <-> DIM_ZEIT oder ein längerer Weg unter Einbeziehung diverser anderer Tabellen.

Der Information Designer erkennt diese Schleifen (hierzu auf den Button “Schleifen visualisieren” klicken):

idt_loops_2

idt_loops_3

Schleifen m√ľssen mit Kontexten aufgel√∂st werden. In diesem Fall ben√∂tigen wir 2 Kontexte – einen f√ľr die Kosten und einen f√ľr die Transaktionen. Die Kontexte werden nach folgendem Vorgehen erzeugt:

  1. Klick auf “Kontext einf√ľgen”
  2. Kontext benennen
  3. Rechtsklick auf Kontext -> Graphisch bearbeiten
  4. Kanten in der graphischen Ansicht markieren
  5. Es reicht nicht, die im Kontext enthaltenen Joins zu markieren. Die restlichen Joins der Schleife m√ľssen als “Ausgeschlossen” markiert werden.

Wenn alles richtig gemacht wurde, zeigt ein erneuter Klick auf “Schleifenaufl√∂sungsstatus regenerieren” einen gr√ľnen Haken an.

idt_loops_4

idt_loops_5

 

 

A start with hortonworks

Why stay caged in the sandbox for long? I want to see hadoop work in a multinode installation. This is quite easy, if you avoid some pitfalls:

Installation of a 5 nodes cluster

1) Prepare one VM, install CentOS. Use minimal installation, everything plain vanilla.
But be aware: While I am writing this, CentOS 7 is not supported by ambari. If you choose the newest version of the os, you will get trouble later with the installation of ambari. Choose version 6 instead!

2) Networking has to be set up now. This is a good description:
http://architects.dzone.com/articles/centos-minimal-installatio

3) The rest of the procedure is well documented here:
https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=34832444#buildingavirtualized5-nodeHDP2.0cluster%28allwithinamac%29-MinimalInstallationusingDHCP

Basically you prepare 5 VMs and their network, install Ambari on the first node and let it do the rest for you.

4) Hue
if you need Hue, the nice user interface of the sandbox, you need to install it manually. Ambari will not provide it.The installation of hue is documented here: http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.6.0/bk_installing_manually_book/content/rpm-chap-hue.html
This was a little confusing to me first. I edited the configuration files directly on the namenodes first. After a while I noticed that my changes had been overwritten. Sure! I installed ambari and its responsible for the configuration files. You have to edit the configuration files with the web ui of ambari!

 

 

bods execution from command line

The easiest way to build the execution command is to use the Management Console. On the tab “Batch Job Configuration” you can export the execution command. The job must be added to a project as precondition to be seen there:

export_execution_command

Bods will create now 2 files and write them to the log directory:
C:\ProgramData\SAP BusinessObjects\Data Services\log

  • Job_Data_Vault.bat
  • Job_Data_Vault.txt

The job can now easily be executed or scheduled by a third party tool the following way:

Job_Data_Vault.bat Job_Data_Vault.txt

(if bods has been installed on windows in a path with spaces you might need to edit the batch file by wrapping command line by quotation marks)