Next

Tomorrow Pass is sponsoring a Linux Marathon for all the SQL DBAs getting ready to go big with Linux.

The schedule is jam packed with sessions on getting started with Linux, working with SQL Server on Linux and then a few sessions, including my own for essential tools that the DBA needs to work with Linux.  I’m the last one on the schedule for the day, to be followed by me heading down to the south side of town for the monthly, (and very Christmas-y) SQL Server User Group meeting.

If you aren’t a member of the PASS organization, it is free and you can register for the event and get more out of the community by attending or speaking at your own local SQL Server User Group or branch out to the national or virtual events.  Just remember, it is the most wonderful time of the year, peeps.



Tags:  , ,

Del.icio.us
Facebook
TweetThis
Digg
StumbleUpon


Copyright © DBA Kevlar [Pass Marathon on Linux for SQL Server, Wednesday, Dec. 13th], All Right Reserved. 2017.

The post Pass Marathon on Linux for SQL Server, Wednesday, Dec. 13th appeared first on DBA Kevlar.

Microsoft SQL Server 2017: Two Useful New Features
This week I discovered two new features in Microsoft SQL Server 2017 that I think will be helpful to both developers and DBAs. These are two small changes, but they solve problems that have been a consistent headache for a long time.

IDENTITY_CACHE

This is a database scoped parameter available in Microsoft SQL Server 2017 & in public preview for Microsoft Azure SQL Database.

Microsoft SQL Server’s default behavior (still default in 2017) is to cache identity column values to quickly supply them when an INSERT statement is run. IDENTITY_CACHE allows this behavior to be turned off or on at the Database-Level, which means when the instance is rebooted or fails over, there will be no (or fewer) gaps in the IDENTITY column values.

This is useful for a couple of people:

  • DBAs who’ve been affected by this bug, or a similar issue.
  • Developers who need the IDENTITY column to be a gapless, incrementing value.

The first people on that list. The DBAs. The pure of heart. They’ve got a good solution and should look forward to Microsoft SQL Server 2017.

The Developers have come one step closer to solving their problem. However, to be clear, this is still not a good idea. Microsoft SQL Server does not run INSERT statements sequentially because it’s a major bottleneck. Also, failed INSERTs will still cause the IDENTITY value to be discarded. But if you’re working with a legacy system, or an application you have no control over, this might fix some headaches for you.

The code is:

ALTER DATABASE SCOPED CONFIGURATION SET IDENTITY_CACHE= [ON|OFF]

SELECT INTO ON FILEGROUP

Everyone’s favorite backup tool, SELECT … INTO is getting the ability to place the new table into the non-default FILEGROUP starting with Microsoft SQL Server 2017.

The syntax will be very simple:
SELECT ... INTO [tablename] ON FILEGROUP

It’s often a problem to copy a large table because of the extra overhead on an already overloaded drive. Other people might be dealing with a roll-your-own archive process.

This will allow the automatic creation and movement of data into a new FILEGROUP quickly and easily.

In the third part of the series we will develop a pipeline to transform messages from “data” Pub/Sub using messages from the “control” topic as source code for our data processor.

The idea is to utilize Scala toolBox. It’s much easier than doing the same in Java. Basically it’s just three lines of code:

val toolbox = currentMirror.mkToolBox()
val tree = toolbox.parse(code)
toolbox.eval(tree)

The problem we have to solve is performance issue. Having parsing and evaluating this code for each processed element is very ineffective. But on the other hand “control” is not something that is frequently changing. So we can definitely get a profit utilizing memoization. We can do result cache per DoFn instance but it’s way too better to use it per JVM level. We just need to take care of thread safety. For the sake of simplicity I will use a synchronized approach while it would be more idiomatic and efficient to use Futures.

The one more thing we have to take care with memoization: how many results will we keep cached? Here is my simple attempt:

class Memoize[K, V](size: Int)(fun: K => V) {
  private val cache = scala.collection.concurrent.TrieMap.empty[K, V]
  private val queue = scala.collection.mutable.Queue.empty[K]
  def apply(k: K): V =
    cache.getOrElse(k,
      this.synchronized {
        cache.getOrElse(k, {
          if (queue.size >= size) {
            cache.remove(queue.dequeue())
          }
          queue.enqueue(k)
          val v = fun(k)
          cache.put(k, v)
          v
        })
    })
}

Please note that executing all to get a result for any not-yet-cached argument will be locked until cache is computed.

The next thing to think about is what K and V in our function should be. The K is obviously String. For V we need to get something that performs parsing itself INPUT => OUTPUT. For our needs we may want to minimize code in control messages so we convert input data into JsonObject before pass it to function. We would get TableRow as result. Thus V is JsonObject => TableRow.

We expect that it would be fine to use for “control” message:

  (json: com.google.gson.JsonObject) => {
    new com.google.api.services.bigquery.model.TableRow()
      .set("id", json.get("id").getAsLong)
      .set("data", json.get("text").getAsString)
  }

Now it’s time to define an object that provides us with the ability to do something like:

val tableRow = code.evalFor(json)

This is the such object:

object Dynamic {
  private val toolbox = currentMirror.mkToolBox()
  private val dynamic = new Memoize(10)((code: String) => {
    val tree = toolbox.parse(code)
    toolbox.eval(tree).asInstanceOf[JsonObject => TableRow]
  })
  def apply(code: String) = dynamic(code)
}

We keep single per JMV instances of toolbox and dynamic memoized compiler. So now we can simply use:

val tableRow = Dynamic(code)(json)

To make it works as String method above we should use implicit value class

  implicit class RichString(val code: String) extends AnyVal {
    def evalFor(arg: JsonObject): TableRow = Dynamic(code)(arg)
  }

Now the only thing we change from our previous side input example is slightly altered MyDoFn:

  class MyDoFn(sideView: PCollectionView[java.util.List[String]]) extends DoFn[String, TableRow] with LazyLogging {
    @ProcessElement
    def processElement(c: ProcessContext) {
      val t0 = System.currentTimeMillis()
      val sideInput = c.sideInput(sideView).get(0)
      val inputString = c.element()
      logger.info(s"Getting new data=$inputString")
      Try {
        val json = new JsonParser().parse(inputString).getAsJsonObject
        sideInput.evalFor(json)
      } match {
        case Success(row) =>
          logger.info(s"Inserting to BiqQuery: $row")
          c.output(row)
        case Failure(ex) =>
          logger.info(s"Unable to parse message: $inputString", ex)
      }
      val t1 = System.currentTimeMillis()
      logger.info(s"Processed data in ${t1 - t0} ms")
    }
  }

I added logger for time processing to show how much time it saves using cached functions.

We are ready to test it now. Start it locally with sbt:

$ sbt run
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from C:\Users\Valentin\workspace\beam-dynamic\project
[info] Loading settings from build.sbt ...
[info] Set current project to beam-dynamic (in build file:/C:/Users/Valentin/workspace/beam-dynamic/)
[info] Running com.pythian.Beam

Now publish our first attempt code into “control” topic and then publish data to “data” topic

//for the "control topic"
  (json: com.google.gson.JsonObject) => {
    new com.google.api.services.bigquery.model.TableRow()
      .set("id", json.get("id").getAsLong)
      .set("data", "placeholder")
  }

// for the "data" topic
{"id":1,"text":"row1"}
{"id":2,"text":"row2"}
{"id":3,"text":"row3"}

We can see how data is processed in the logs:

2017/12/11 17:47:24.049 INFO  com.pythian.Beam$MyDoFn - Getting new data={"id":1,"text":"row1"}
2017/12/11 17:47:24.544 INFO  com.pythian.Beam$MyDoFn - Getting new data={"id":2,"text":"row2"}
2017/12/11 17:47:25.816 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=1, data=placeholder}
2017/12/11 17:47:25.816 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=2, data=placeholder}
2017/12/11 17:47:25.880 INFO  com.pythian.Beam$MyDoFn - Processed data in 923 ms
2017/12/11 17:47:25.880 INFO  com.pythian.Beam$MyDoFn - Processed data in 336 ms
2017/12/11 17:47:30.076 INFO  com.pythian.Beam$MyDoFn - Getting new data={"id":3,"text":"row3"}
2017/12/11 17:47:30.076 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=3, data=placeholder}
2017/12/11 17:47:30.077 INFO  com.pythian.Beam$MyDoFn - Processed data in 1 ms

It took about 1 sec to compile the code for id=1, then for id=2 it was waiting for some time on synchronized while “control” function has been compiling and for id=3 it took just a 1 ms.

Now lets change our parsing code and publish yet another 3 data rows:

//for the "control" topic
  (json: com.google.gson.JsonObject) => {
    new com.google.api.services.bigquery.model.TableRow()
      .set("id", json.get("id").getAsLong)
      .set("data", json.get("text").getAsString)
  }

//for the "data" topic
{"id":4,"text":"row4"}
{"id":5,"text":"row5"}
{"id":6,"text":"row6"}

And here is a log:

2017/12/11 17:54:09.859 INFO  com.pythian.Beam$MyDoFn - Getting new data={"id":4,"text":"row4"}
2017/12/11 17:54:10.237 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=4, data=row4}
2017/12/11 17:54:10.238 INFO  com.pythian.Beam$MyDoFn - Processed data in 379 ms
2017/12/11 17:54:26.885 INFO  com.pythian.Beam$MyDoFn - Getting new data={"id":5,"text":"row5"}
2017/12/11 17:54:26.885 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=5, data=row5}
2017/12/11 17:54:26.886 INFO  com.pythian.Beam$MyDoFn - Processed data in 0 ms
2017/12/11 17:54:42.868 INFO  com.pythian.Beam$MyDoFn - Getting new data={"id":6,"text":"row6"}
2017/12/11 17:54:42.869 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=6, data=row6}
2017/12/11 17:54:42.871 INFO  com.pythian.Beam$MyDoFn - Processed data in 1 ms

As expected the new code applied and later messages processed much faster than the first one.

Please note that as with any other dynamic approach you have take care about permissions for the “control” topic.

You can find the code here.

In the second part of this series we will develop a pipeline to transform messages from “data” Pub/Sub topic with the ability to control the process via “control” topic.

How to pass effectively non-immutable input into DoFn, is not obvious, but there is a clue in documentation:

If the side input has multiple trigger firings, Beam uses the value from the latest trigger firing. This is particularly useful if you use a side input with a single global window and specify a trigger.

Having this in hand we can utilize the construction like this:

  val sideView = p.apply(PubsubIO.readStrings()
      .fromSubscription(fullSideInputSubscriptionName))
    .apply(Window.into[String](new GlobalWindows())
      .triggering(Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
      .discardingFiredPanes())
    .apply(View.asList())

We have one single global window, fire trigger for incoming rows, keep only new values and convert it to the view to be able to use it for sideInput.

A couple of things to note here. First of all you have to keep in mind that the system is eventually consistent and doesn’t prevent message order. So in this case you may want to use accumulatingFiredPanes instead. Secondly, trigger firing is supposed to provide at least 1 value, but theoretically it may multiply and if you use View.asSingletoneton your pipeline will fail. The last thing to note is that pipeline will wait for the first message from the sideInput before it will start processing the steps using it.

Now we change our template slightly to use sideView in it:

  p.apply(PubsubIO.readStrings().fromSubscription(fullSubscriptionName))
    .apply(ParDo.of(new MyDoFn(sideView)).withSideInputs(sideView))
    .apply(BigQueryIO
      .writeTableRows()
      .to(targetTable)
      .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
      .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER))

DoFn uses this side input to switch if we either process the data or ignore incoming messages:

  class MyDoFn(sideView: PCollectionView[java.util.List[String]]) extends DoFn[String, TableRow] with LazyLogging {
    @ProcessElement
    def processElement(c: ProcessContext) {
      val sideInput = c.sideInput(sideView).get(0)
      val inputString = c.element()
      if (sideInput == "ENABLED") {
        ...
      } else {
        logger.info(s"Ignoring input messages, sideInput=$sideInput")
      }
    }
  }

To test our pipeline we will send a sequence of messages into both Pub/Sub topics. Start the pipeline locally:

$ sbt run
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from C:\Users\Valentin\workspace\beam-sideinput\project
[info] Loading settings from build.sbt ...
[info] Set current project to beam-sideinput (in build file:/C:/Users/Valentin/workspace/beam-sideinput/)
[info] Running com.pythian.Beam

Send a few messages to “data” topic:

{"id":1,"text":"row1"}
{"id":2,"text":"row2"}
{"id":3,"text":"row3"}

There is nothing processed or ignored in the logs as far as we haven’t yet published anything into the “control” topic.
Now lets publish "ENABLED" in the “control” topic. In a while we will see three rows above processed:

2017/12/11 14:43:09.773 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=2, data=row2}
2017/12/11 14:43:09.773 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=3, data=row3}
2017/12/11 14:43:09.773 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=1, data=row1}

Send one more message to the “data” topic — it is processed as expected

{"id":4,"text":"row4"}
...
2017/12/11 14:44:42.805 INFO  com.pythian.Beam$MyDoFn - Inserting to BiqQuery: {id=4, data=row4}

Now send "DISABLED" to the “control” topic and try yet another message to the “data” topic:

{"id":5,"text":"row5"}
...
2017/12/11 14:46:32.097 INFO  com.pythian.Beam$MyDoFn - Ignoring input messages, sideInput=DISABLED

Exactly as expected. Note that having this pipelines running on DataFlow there may be delay in propagation and you will get new “data” processed after you published disabling message.

You can find the code here.

In the third part of this series we will build a powerful pipeline that uses side input messages as source code for data processors. This may be helpful for the building flexible pipelines that can be tuned on a fly without restarting of them.

In this 3-part series I’ll show you how to build and run Apache Beam pipelines using Java API in Scala.

In the first part we will develop the simplest streaming pipeline that reads jsons from Google Cloud Pub/Sub, convert them into TableRow objects and insert them into Google Cloud BigQuery table. Then we will run our pipeline with sbt on local runner and then deploy it on Google Cloud Platform (GCP).

Prerequisites
You need to have GCP project created with enabled API for DataFlow, Pub/Sub and BigQuery. Perform gcloud auth login or create service account with proper permissions and download key.json locally, install JDK and sbt. Also you have to create Pub/Sub topic and subscription, target BQ table and temp location in GS.

Building pipeline
First of all you need to create an instance of options. You may either use fromArgs method or set parameters manually.

  trait TestOptions extends PipelineOptions with DataflowPipelineOptions
  val options = PipelineOptionsFactory.create().as(classOf[TestOptions])

The next thing to do is to define input subscription name and output table reference object for pipeline I/O.

  val fullSubscriptionName = 
    s"projects/$projectId/subscriptions/$subscription"
  val targetTable = 
    new TableReference()
      .setProjectId(projectId)
      .setDatasetId(dataset)
      .setTableId(tableName)

Now we can describe our DoFn function. It processes json string messages trying to convert them into TableRow:

  class MyDoFn extends DoFn[String, TableRow] with LazyLogging {
    @ProcessElement
    def processElement(c: ProcessContext) {
      val inputString = c.element()
      logger.info(s"Received message: $inputString")
      Try {
        Transport.getJsonFactory.fromString(inputString, classOf[TableRow])
      } match {
        case Success(row) ?
          logger.info(s"Converted to TableRow: $row")
          c.output(row)
        case Failure(ex) ?
          logger.info(s"Unable to parse message: $inputString", ex)
      }
    }
  }

The last thing to do is to combine all parts together using pipeline object:

  val p = Pipeline.create(options)
  p.apply("read-pubsub", PubsubIO
      .readStrings()
      .fromSubscription(fullSubscriptionName))
    .apply("process", ParDo.of(new MyDoFn))
    .apply("write-bq", BigQueryIO
      .writeTableRows()
      .to(targetTable)
      .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
      .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER))

And we are ready to run it:

  p.run()

Run pipeline locally

To start your pipeline locally you need to specify DirectRunner in pipeline options. Then you can simply start your pipeline with sbt run command:

$ sbt run
[info] Loading settings from plugins.sbt ...
[info] Loading project definition from C:\Users\Valentin\workspace\beam-template\project
[info] Loading settings from build.sbt ...
[info] Set current project to beam-template (in build file:/C:/Users/Valentin/workspace/beam-template/)
[info] Running com.pythian.Beam

Then you can publish the message from cloud console into you topic to test it:

{ "id": 1, "data": "test data" }

In a while you should see something like:

2017/12/11 01:54:16.581 INFO  com.pythian.Beam$MyDoFn - Received message: { "id": 1, "data": "test data" }
2017/12/11 01:54:16.588 INFO  com.pythian.Beam$MyDoFn - Converted to TableRow: {"id":1,"data":"test data"}

You can now select your row from BigQuery (please note that table preview won’t show the rows which are in streaming buffer yet):

select * from test_nikotin.test where id = 1

Run pipeline in DataFlow

Once you are done with your tests you are ready to start it on GCP. Configure runner to DataflowRunner and run sbt:

$ sbt run
[info] Loading settings from plugins.sbt ...
...
[info] Running com.pythian.Beam
...
2017/12/11 01:50:04.937 INFO  o.a.b.r.dataflow.util.PackageUtil - Uploading 112 files from PipelineOptions.filesToStage to staging location to prepare for execution.
2017/12/11 01:50:09.093 INFO  o.a.b.r.dataflow.util.PackageUtil - Staging files complete: 111 files cached, 1 files newly uploaded
2017/12/11 01:50:09.196 INFO  o.a.b.r.d.DataflowPipelineTranslator - Adding read-pubsub/PubsubUnboundedSource as step s1
...
Dataflow SDK version: 2.1.0
2017/12/11 01:50:11.064 INFO  o.a.b.r.dataflow.DataflowRunner - To access the Dataflow monitoring console, please navigate to https://console.developers.google.com/project/myproject/dataflow/job/2017-12-10_14_50_12-10326138943752681303
Submitted job: 2017-12-10_14_50_12-10326138943752681303
2017/12/11 01:50:11.064 INFO  o.a.b.r.dataflow.DataflowRunner - To cancel the job using the 'gcloud' tool, run:
> gcloud beta dataflow jobs --project=myproject cancel 2017-12-10_14_50_12-10326138943752681303
[success] Total time: 17 s, completed Dec 11, 2017 1:50:11 AM

You can navigate to DataFlow service in cloud console to verify it’s running as expected and check the logs in Cloud Logging.

You can find the code here

In the second part I’ll build a pipeline with control flow from another Pub/Sub topic via side input.

Data as a Data Scientist Recruitment Tool - 12-Dec-2017 08:02 - Pythian

Everyone knows that data scientists are in high demand and recruiting them is very challenging these days, especially if you’re not a huge sexy brand-name company. What are these data scientists looking for in a company and how can you make sure you can deliver what they want?

I was talking to a client yesterday about his data scientist recruiting challenges and he told me a story that really resonated. He works for a hugely successful software company, his offices offers that wonderful mix of fun, but not over-the-top, located in a great part of town. Everyone I met was incredibly nice and looked happy to be there. They are VERY committed to differentiating themselves based on data– from the CEO down. What could be better than that as a set up for recruitment of a data scientist?

He told me that they had just been turned down by two data scientist candidates, not for any reason you’d expect but because the company couldn’t convince the candidates that their data was good enough to support a productive data science initiative. That was the first time I had heard that, but it makes total sense. Talk to any data scientist about their frustrations at work and most of them will talk not about their internal clients or the company valuing their work, they’ll talk about how much time they spend finding, cleaning data and integrating data. They’ll talk about not having access to enough data and about how long it takes to run their models.

So if you have a data lake with all the data in it. If your data is clean. If your data systems are powerful enough to deliver results within a reasonable time, you should make this a big part of your recruitment process. I know from experience that most companies are not yet this mature in their analytics platforms so you’ll have a real recruiting edge.

And of course the converse is also true. If you don’t have a well architected data platform and you want to move into advanced analytics, start now to get this addressed. Not only will you be able to attract the best talent, you’ll also make them much more productive and when they deliver results sooner, you will also a win.

A secure distributed ledger with smart contract capabilities not requiring a bank as an intermediary! Also a single source of truth with complete traceability. Definitely something we want! Blockchain technology promises to make this possible. Blockchain became famous through cryptocurrency like Bitcoin and Ethereum. The technology could also be considered to replace B2B functionality. With new technologies it is not a bad idea to look at pro’s and con’s before starting an implementation. Blockchain is the new kid on the block and there is not much experience yet on how well he will play with others and will mature. In this blog I summarize some of my concerns concerning blockchain of which I hope will be solved in due time.

Regarding new/emerging technologies in the integration space, I’m quite open to investigate the potential value which they can offer. I’m a great proponent of for example Kafka, the highly scalable streaming platform and Docker to host microservices. However, I’ve been to several conferences and did some research online regarding blockchain and I’m sceptical. I definitely don’t claim to be an expert on this subject so please correct me if I’m wrong! Also, this is my personal opinion. It might deviate from my employers and customers views.

Most of the issues discussed here are valid for public blockchains. Private blockchains are of course more flexible since they can be managed by companies themselves. You can for example more easily migrate private blockchains to a new blockchain technology or fix issues with broken smart contracts. These do require management tooling, scripts and enough developers / operations people around your private blockchain though. I don’t think it is a deploy and go solution just yet.

1 Immutable is really immutable!

A pure public blockchain (not taking into account sidechains and off chain code) is an immutable chain. Every block uses a hashed value of the previous block in its encryption. You cannot alter a block which is already on the chain. This makes sure things you put on the chain cannot suddenly appear or disappear. There is traceability. Thus you cannot accidentally create money for example on a distributed ledger (unless you create immutable smart contracts to provide you with that functionality). Security and immutability are great things but they require you to work in a certain way we are not that used to yet. For example, you cannot cancel a confirmed transaction. You have to do a new transaction counteracting the effects of the previous one you want to cancel. If you have an unconfirmed transaction, you can ‘cancel’ it by creating a new transaction with the same inputs and a higher transaction fee (at least on a public blockchain). See for example here. Also if you put a smart contract on a public chain and it has a code flaw someone can abuse, you’re basically screwed. If the issue is big enough, public blockchains can fork (if ‘the community’ agrees). See for example the DAO hack on Etherium. In an enterprise environment with a private blockchain, you can fork the chain and replay the transactions after the issue you want corrected on the chain. This however needs to be performed for every serious enough issue and can be a time consuming operation. In this case it helps (in your private blockchain) if you have a ‘shadow administration’ of transactions. You do have to take into account however that transactions can have different results based on what has changed since the fork. Being careful here is probably required.

2 Smart Contracts

Smart contracts! It is really cool you can also put a contract on the chain. Execution of the contract can be verified by nodes on the chain which have permission and the contract is immutable. This is a cool feature!

However there are some challenges when implementing smart contracts. A lot becomes possible and this freedom creates sometimes unwanted side-effects.

CryptoKitties

You can lookup CryptoKitties, a game implemented by using Smart Contracts on Etherium. They can clog a public blockchain and cause transactions to take a really long time. This is not the first time blockchain congestion occurs (see for example here). This is a clear sign there are scalability issues, especially with public blockchains. When using private blockchains, these scalability issues are also likely to occur eventually if the number of transactions increases (of course you can prevent CryptoKitties on a private blockchain). The Bitcoin / VISA comparison is an often quoted one, although there is much discussion on the validity of the comparison.

Immutable software. HelloWorld forever!

Smart contracts are implemented in code and code contains bugs and those bugs, depending on the implementation, sometimes cannot be fixed since the code on the chain is immutable. Especially since blockchain is a new technology, many people will put buggy code on public blockchains and that code will remain there forever. If you create DAO‘s (decentralized autonomous organizations) on a blockchain, this becomes even more challenging since the codebase is larger. See for example the Etherium DAO hack.

Because the code is immutable, it will remain on the chain forever. Every hello world tryout, every CryptoKitten from everyone will remain there. Downloading the chain and becoming a node will thus become more difficult as the amount of code on the chain increases, which it undoubtedly will.

Business people creating smart contracts?

A smart contract might give the idea a business person or lawyer should be able to design/create them. If they can create deterministic error free contracts which will be on the blockchain forever, that is of course possible. It is a question though how realistic that is. It seems like a similar idea that business people could create business rules in business rule engines (‘citizen developers’). In my experience technical people need to do that in a controlled, tested manner.

3 There is no intermediary and no guarantees

There is no bank in between you and the (public) blockchain. This can be a good thing since a bank eats money. However in case of for example the blockchain loses popularity, steeply drops in value or has been hacked (compare with a bank going bankrupt, e.g. Icesave) than you won’t have any guarantees like for example the deposit guarantee schemes in the EU. Your money might be gone.

4 Updating the code of a blockchain

Updating the core code of a running blockchain is due to its distributed nature, quite the challenge. This often leads to forks. See for example Bitcoin forks like Bitcoin Cash and Bitcoin Gold and an Etherium fork like Byzantium. The issue with forks is that it makes the entire cryptocurrency landscape crowded. It is like Europe in the past when every country had their own coin. You have to exchange coins if you want to spend in a certain country (using the intermediaries everyone wants to avoid) or have a stack of each of them. Forks, especially hard forks come with security challenges such as replay attacks (transactions which can be valid on different chains). Some reasons you might want to update the code is because transactions are slow, security becomes an issue in the future (quantum computing) or new features are required (e.g. related to smart contracts).

5 Blockchain and privacy legislation (GDPR)

Security

Security is one of the strong points of blockchain technology and helps with the security by design and by default GDPR requirements. There are some other things to think about though.

The right to be forgotten

Things put on a blockchain are permanent. You cannot delete them afterwards, although you might be able to make then inaccessible in certain cases. This conflicts with the GDPR right to be forgotten.

Data localization requirements

Every node has the entire blockchain and thus all the data. This might cause issues with legislation. For example requirements to have data contained within the same country. This becomes more of a challenge when running blockchain in a cloud environment. In Europe with many relatively small countries, this will be more of an issue compared to for example the US, Russia or China.

Blockchain in the cloud

It is really dependent on the types of services the blockchain cloud provider offers and how much they charge for it. It could be similar to using a bank, requiring you to pay per transaction. In that case, why not stick to a bank? Can you enforce the nodes being located in your country? If you need to fix a broken smart contract, will there be a service request and will the cloud provider fork and replay transactions for you? Will you get access to the blockchain itself? Will they provide a transaction manager? Will they guarantee a max transactions per second in their SLA? A lot of questions for which there are probably answers (which differ per provider) and based on those answers, you can make a cost calculation if it will be worthwhile to use the cloud blockchain. In the cloud, the challenges with being GDPR compliant are even greater (especially for European governments and banks).

6 Lost your private key?

If you have lost your private key or lost access to your wallet (more business friendly name of a keystore) containing your private key, you might have lost your assets on the blockchain. Luckily a blockchain is secure and there is no easy way to fix this. If you have a wallet which is being managed by a 3rd party, they might be able to help you with recovering it. Those 3rd parties however are hacked quite often (a lot of value can be obtained from such a hack). See for example here, here and here.

7 A blockchain transaction manager is required

A transaction is put on the blockchain. The transaction is usually verified by several several nodes before it is distributed to all nodes and becomes part of the chain. Verification can fail or might take a while. This can be hours on some public blockchains. It could be the transaction has been caught up by another transaction with higher priority. In the software which is integr

Docker : My First Steps - 12-Dec-2017 02:51 - Tim Hall

In a blog post after OpenWorld I mentioned I might not be writing so much for a while as something at work was taking a lot of my “home time”, which might result in some articles, but then again might not… Well, that something was Docker…

After spending a couple of years saying I was going to start looking at Docker, in June I wrote a couple of articles, put them on the website, but didn’t mention them to anyone.  I was finding it quite hard to focus on Docker because of all the fun I was having with ORDS. More recently it became apparent that we have a couple of use-cases for Docker at work, one of which involved ORDS, so it reignited my interest. There’s nothing like actually needing to use something to make you knuckle down and learn it… 🙂

Having gone back to revisit Docker, I realised the two articles I wrote were terrible, which wasn’t surprising considering how little time I had spent using Docker at that point. The more I used Docker, the more I realised I had totally missed the point. I had come to it with too many preconceptions, mostly relating to virtualization, that were leading me astray. I reached out to a few people (Gerald Venzl, Bruno Borges & Avi Miller) for help and advice, which got me back on track…

I’ve been playing around with Docker a lot lately, which has resulted in a few articles, with some more on the way. I’m not trying to make out I’m “the Docker guy” now, because I’m clearly not. I’m not suggesting you use my Docker builds, because there are better ones around, like these. I’m just trying to learn this stuff and I do that by playing and writing. If other people find that useful and want to follow me on the journey, that’s great. If you prefer to go straight to the source (docs.docker.com) that’s probably a better idea. 🙂

I do a lot of rewrites of articles on my website in general. This is especially true of these Docker articles, which seem to be in a permanent state of flux at the moment. Part of me wanted to wait until I was a little more confident about it all, because I didn’t want to make all my mistakes in public, then part of me thought, “sod it!”

If you want to see what I’ve been doing all the articles are on my website and the Dockerfiles on Github.

I’m having a lot of fun playing around with Docker. You could say, I’m having a “whale” of a time! (I’ll get my coat…)

Cheers

Tim…


Docker : My First Steps was first posted on December 12, 2017 at 8:51 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

While opening the database, if you are getting below ORA-0600 error. Whenever an ORA-0600 error is raised, a trace file is generated and an entry is written to the alert.log with details of the trace file location. As of Oracle 11g, the database includes an advanced fault diagnosability infrastructure to manage trace data. 

Problem:

ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr]

Solution:

1. Check the Alert Log

The alert log may indicate additional errors or other internal errors at the time of the problem. Focus your analysis of the problem on the first internal error in the sequence. There are some exceptions, but often additional internal errors are side-effects of the first error condition.

The associated trace file may be truncated if the MAX_DUMP_FILE_SIZE parameter is not setup high enough or to „unlimited‟. If you see a message at the end of the trace file

“MAX DUMP FILE SIZE EXCEEDED”

there could be vital diagnostic information missing in the file and finding the root issue may be very difficult. Set the MAX_DUMP_FILE_SIZE appropriately and regenerate the error for complete trace information.

2. Search 600/7445 Lookup Tool

Visit My Oracle Support to access the ORA-00600 Lookup tool (Note 600.1). The ORA-600/ORA-7445 Lookup tool may lead you to applicable content in My Oracle Support on the problem and can be used to investigate the problem with argument data from the error message or you can pull out key stack pointers from the associated trace file to match up against known bugs.

3. Investigate reference material on the error

In the search tool from above, choose Error Code ORA-600 and enter the first argument number or character string. Click on „Lookup error‟ button to review the reference note for your error. The reference note will provide a description of the error and may point to specific notes or bugs that possibly cause the error.

5.

SQL>startup
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.

Total System Global Area 8754618368 bytes
Fixed Size 4646288 bytes
Variable Size 4160756336 bytes
Database Buffers 4429185024 bytes
Redo Buffers 160030720 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr]

Solution:

1. start the database in mount stage:

 

SQL>Startup mount ;
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Total System Global Area 8754618368 bytes
Fixed Size 4646288 bytes
Variable Size 4160756336 bytes
Database Buffers 4429185024 bytes
Redo Buffers 160030720 bytes
Database mounted.

2. Get controlfile and redo file location:

SQL[SYS@BBCRMST1]SQL>>]Show parameter control_files

NAME TYPE VALUE
———————————— ———– ——————————
control_files string /u03/oracle/oradata/BBCRMST1/control01.ctl, /u03/oracle/orad
ata/BBCRMST1/control02.ctl

SQL[SYS@BBCRMST1]SQL>>] select a.member, a.group#, b.status from v$logfile a ,v$log b where a.group#=b.group# and b.status=’CURRENT’;

MEMBER GROUP# STATUS
——————————————————————- ———- —————-
/u03/oracle/oradata/BBCRMST1/redo06.log 6 CURRENT

3. shutdown the database and take physical backup of controlfile

shutdown abort

cp /u03/oracle/oradata/BBCRMST1/control01.ctl /u03/oracle/oradata/BBCRMST1/control01.ctl_bkup
cp /u03/oracle/oradata/BBCRMST1/control02.ctl /u03/oracle/oradata/BBCRMST1/control02.ctl_bkup

4. startup mount

Startup mount ;

5. Recover the database by applying current logfile: 

recover database using backup controlfile until cancel ;
/u03/oracle/oradata/BBCRMST1/redo06.log

SQL> Alter database open resetlogs ;

 

The post ORA-00600 Kcratr_nab_less_than_odr While Starting The Database appeared first on ORACLE-HELP.

This post explains about the steps which is used by DBA while facing error ORA-27054. For beginners, we have few lines that explain about the technical view of NFS.

Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call (ONC RPC) system. The NFS is an open standard defined in Request for Comments (RFC), allowing anyone to implement the protocol. 

While doing expdp  the dumpfile to an NFS mount point, you may receive below error.

Error:

Cause: The file was on an NFS partition and either reading the mount tab file failed or the partition was not mounted with the correct mount option.

Solution:

Run the below statement as sysdba

SQL> alter system set events ‘10298 trace name context forever, level 32’;

System altered

Now trigger the expdp again.

oracle@crmcpredb1:/expdmp/LOY900$ expdp parfile=exp_loy_900_crmpre.par

Export: Release 11.2.0.4.0 – Production on Mon Apr 18 12:39:34 2016

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

Username: / as sysdba

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Starting “SYS”.”SYS_EXPORT_TABLE_08″: /******** AS SYSDBA parfile=exp_loy_900_crmpre.par
An estimate of progress using BLOCKS method…

 

 

The post ORA-27054: NFS File System Where The File Is Created While Expdp appeared first on ORACLE-HELP.

Next