Linux News

Australia does not need an NBN network for faster network speeds?

Securitron - Wed, 02/15/2017 - 16:32
The head of the NBN company has claimed that Australians would not even need or use super fast NBN download speeds, even if it was offered for free. This is despite the massive adoption of online gaming and Netflix video streaming. As well as downloading 10 gigabyte game patches. Steam uses a lot of bandwidth …

Read More →
Categories: Linux News

GNOME Maps 3.24 To Support Transit Routing

Phoronix - Wed, 02/15/2017 - 15:47
GNOME Maps has become a much more viable piece of software with transit routing support having landed in Git master...
Categories: Linux News

HHVM 3.18 Released With Garbage Collection Options, Ubuntu 16.10 Support

Phoronix - Wed, 02/15/2017 - 15:12
Facebook's team working on HHVM, their high-performance implementation of PHP and also what's used by their Hack language, is now up to version 3.18...
Categories: Linux News

DuckDuckGo Ups Ante: Gives $300K to 'Raise the Standard of Trust'

Linux Today - Wed, 02/15/2017 - 15:00

For the seventh year in a row, the search engine that promises not to stalk your online moves puts its money where its mouth is, this year by donating $300,000 to organizations that work towards online privacy.

Categories: Linux News

How to install PHP 7 on Debian Linux 8.7/7.x [jessie/wheezy]

Linux Today - Wed, 02/15/2017 - 13:00

I wanted to use PHP 7 on Debian 8.x. How do I install and configure PHP 7 on Debian Linux 8.x server?

Categories: Linux News

Deploying Kubernetes on Bare Metal

Canonical - Wed, 02/15/2017 - 12:45

Fast forward to 6 minutes, 42 seconds to begin the demo.

In this demo Marco Ceppi deploys a fully functional Kubernetes cluster on 10 nodes.

If you’re interested in bare metal Kubernetes, we invite you to join us and other contributors in the sig-onprem community.

Not sure how to get started? Join us in our Getting started with the Canonical Distribution of Kubernetes webinar on the 22nd of February.

Categories: Linux News

Linux Foundation and NCWIT release free Inclusive Speaker Orientation course

Linux Today - Wed, 02/15/2017 - 12:30

At the Open Source Leadership Summit in Tahoe today, the Linux Foundation and the National Center for Women & Information Technology (NCWIT) announced a free new online course for event speakers.

Categories: Linux News

How to Install Firefox Nightly as a Flatpak App on Ubuntu

OMG Ubuntu - Wed, 02/15/2017 - 11:51

We show you how to set-up Flatpak on Ubuntu to install the latest Firefox Nightly build. If you've been keen to try Flatpak out, here's how!

This post, How to Install Firefox Nightly as a Flatpak App on Ubuntu, was written by Joey Sneddon and first appeared on OMG! Ubuntu!.

Categories: Linux News

Radeon Windows 10 vs. Linux RadeonSI/RADV Gaming Performance

Phoronix - Wed, 02/15/2017 - 11:00
On Monday I published a Windows 10 vs. Ubuntu Linux gaming performance comparison with NVIDIA GeForce graphics while today the tables have turned and is a Windows vs. Linux gaming benchmark battle with AMD Radeon graphics.
Categories: Linux News

Best Windows Like Linux Distributions For New Linux Users

Linux Today - Wed, 02/15/2017 - 11:00

Hey new Linux users, you may be wondering that which Linux distro to choose after seeing so many distros based on Linux

Categories: Linux News

Unreal Engine 4.15 Released: Improved Vulkan Support

Phoronix - Wed, 02/15/2017 - 10:44
Epic Games announced the release this morning of Unreal Engine 4.15...
Categories: Linux News

CloudStats - Best Server Monitoring Tool for Linux Servers

NoobsLab - Wed, 02/15/2017 - 10:30
CloudStats is an effective tool for Linux server monitoring and network monitoring. With CloudStats you get whole visibility into key performance criteria of your Linux Server. You can proactively track different server metrics like CPU, disk and memory usage, services, apps, processes and more. The best thing is that you don’t need to have any special technical skills – this tool for server monitoring is very easy to install and run from any device.
Server Monitoring
It takes only one SSH command to run the CloudStats server monitoring tool on your server to get all your server statistics in one place. After synchronization with your Linux server you’ll keep under control your entire virtual infrastructure. You’ll get information about your servers and networks, including CPU, Disk, RAM, Network usage, etc. You can also monitor Apache, MySQL, Mail, FTP, DNS and other services. System alerts and notifications will help you to timely detect and fix any failures in the server functionality and prevent downtime.

How it works?
  • Sign in and install the server monitoring agent in the server.
  • The agent will collect critical metrics about your servers and networks.
  • Get reports about your system status and receive notifications via Email, Skype or Slack.
  • Manage services from your home or office PC or a mobile device.
  • Back up your data on a regular basis.
With CloudStats it is possible to perform server monitoring from anywhere in the world, no matter where you are. It operates from Microsoft Azure cloud technology making sure its monitoring results are always correct and up-to-date.

Here is a list of features of CloudStats:
  • Linux Server Monitoring;
  • Data backup tool;
  • Network traffic monitoring;
  • Services monitoring;
  • Process monitoring;
  • External checks;
  • Website monitoring and PingMap;
  • Email, Skype and Slack Alerts;
  • Free account available

CloudStats Server Monitoring service is a one-stop-shop solution to monitor, backup and manage your whole IT infrastructure, no matter how many websites, servers and cloud instances you have and where they are located. This Software-as-a-Service tool is suitable both for business and personal use.
You can sign up for a free personal package for up to 10 Servers, Websites and IP addresses today!Visit on site
Categories: Linux News

GPUs & Kubernetes for Deep Learning — Part 1/3

Canonical - Wed, 02/15/2017 - 09:02

A few weeks ago I shared a side project about building a Building a DYI GPU cluster for k8s to play with Kubernetes with a proper ROI vs. AWS g2 instances.

This was spectacularly interesting when AWS was lagging behind with old nVidia K20s cards (which are not supported anymore on the latest drivers). But with the addition of the P series (p2.xlarge, 8xlarge and 16xlarge) the new cards are K80s with 12GB RAM, outrageously more powerful than the previous ones.

Baidu just released a post on the Kubernetes blog about the PaddlePaddle setup, but they only focused on CPUs. I thought it would be interesting looking at a setup of Kubernetes on AWS adding some GPU nodes, then exercise a Deep Learning framework on it. The docs say it is possible…

This post is the first of a sequence of 3: Setup the GPU cluster (this blog), Adding Storage to a Kubernetes Cluster (right afterwards), and finally run a Deep Learning training on the cluster (working on it, coming up post MWC…).

The Plan

In this blog, we will:

  1. Deploy k8s on AWS in a development mode (no HA, colocating etcd, the control plane and PKI)
  2. Deploy 2x nodes with GPUs (p2.xlarge and p2.8xlarge instances)
  3. Deploy 3x nodes with CPU only (m4.xlarge)
  4. Validate GPU availability

For what follows, it is important that:

  • You understand Kubernetes 101
  • You have admin credentials for AWS
  • If you followed the other posts, you know we’ll be using the Canonical Distribution of Kubernetes, hence some knowledge about Ubuntu, Juju and the rest of Canonical’s ecosystem will help.
  • Make sure you have Juju installed.

On Ubuntu,

sudo apt-add-repository ppa:juju/stable sudo apt update sudo apt install -yqq juju

for other OSes, lookup the official docs

Then to connect to the AWS cloud with your credentials, read this page

  • Finally copy this repo to have access to all the sources
git clone ./ cd blogposts/k8s-gpu-cloud

OK! Let’s start GPU-izing the world!

Deploying the cluster Boostrap

As usual start with the bootstrap sequence. Just be careful that p2 instances are only available in us-west-2, us-east-1 and eu-west-2 as well as the us-gov regions. I have experience issues running p2 instances on the EU side hence I recommend using a US region.

juju bootstrap aws/us-east-1 — credential canonical — constraints “cores=4 mem=16G root-disk=64G” # Creating Juju controller “aws-us-east-1” on aws/us-east-1 # Looking for packaged Juju agent version 2.1-rc1 for amd64 # Launching controller instance(s) on aws/us-east-1… # — i-0d48b2c872d579818 (arch=amd64 mem=16G cores=4) # Fetching Juju GUI 2.3.0 # Waiting for address # Attempting to connect to # Attempting to connect to # Logging to /var/log/cloud-init-output.log on the bootstrap machine # Running apt-get update # Running apt-get upgrade # Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux # Fetching Juju agent version 2.1-rc1 for amd64 # Installing Juju machine agent # Starting Juju machine agent (service jujud-machine-0) # Bootstrap agent now started # Contacting Juju controller at to verify accessibility… # Bootstrap complete, “aws-us-east-1” controller now available. # Controller machines are in the “controller” model. # Initial model “default” added. Deploying instances

Once the controller is ready we can start deploying services. In my previous posts, I used bundles which are shortcuts to deploy complex apps.

If you are already familiar with Juju you can run juju deploy src/k8s-gpu.yaml and jump at the end of this section. For the others interested in getting into the details, this time we will deploy manually, and go through the logic of the deployment.

Kubernetes is made of 5 individual applications: Master, Worker, Flannel (network), etcd (cluster state storage DB) and easyRSA (PKI to encrypt communication and provide x509 certs).

In Juju, each app is modeled by a charm, which is a recipe of how to deploy it.

At deployment time, you can give constraints to Juju, either very specific (instance type) or laxist (# of cores). With the later, Juju will elect the cheapest instance matching your constraints on the target cloud.

First thing is to deploy the applications:

juju deploy cs:~containers/kubernetes-master-11 --constraints "cores=4 mem=8G root-disk=32G" # Located charm "cs:~containers/kubernetes-master-11". # Deploying charm "cs:~containers/kubernetes-master-11". juju deploy cs:~containers/etcd-23 --to 0 # Located charm "cs:~containers/etcd-23". # Deploying charm "cs:~containers/etcd-23". juju deploy cs:~containers/easyrsa-6 --to lxd:0 # Located charm "cs:~containers/easyrsa-6". # Deploying charm "cs:~containers/easyrsa-6". juju deploy cs:~containers/flannel-10 # Located charm "cs:~containers/flannel-10". # Deploying charm "cs:~containers/flannel-10". juju deploy cs:~containers/kubernetes-worker-13 --constraints "instance-type=p2.xlarge" kubernetes-worker-gpu # Located charm "cs:~containers/kubernetes-worker-13". # Deploying charm "cs:~containers/kubernetes-worker-13". juju deploy cs:~containers/kubernetes-worker-13 --constraints "instance-type=p2.8xlarge" kubernetes-worker-gpu8 # Located charm "cs:~containers/kubernetes-worker-13". # Deploying charm "cs:~containers/kubernetes-worker-13". juju deploy cs:~containers/kubernetes-worker-13 --constraints "instance-type=m4.2xlarge" -n3 kubernetes-worker-cpu # Located charm "cs:~containers/kubernetes-worker-13". # Deploying charm "cs:~containers/kubernetes-worker-13".

Here you can see an interesting property in Juju that we never approached before: naming the services you deploy. We deployed the same kubernetes-worker charm twice, but twice with GPUs and the other without. This gives us a way to group instances of a certain type, at the cost of duplicating some commands.

Also note the revision numbers in the charms we deploy. Revisions are not directly tight to versions of the software they deploy. If you omit them, Juju will elect the lastest revision, like Docker would do on images.

Adding the relations & Exposing software

Now that the applications are deployed, we need to tell Juju how they are related together. For example, the Kubernetes master needs certificates to secure its API. Therefore, there is a relation between the kubernetes-master:certificates and easyrsa:client.

This relation means that once the 2 applications will be connected, some scripts will run to query the EasyRSA API to create the required certificates, then copy them in the right location on the k8s master.

These relations then create statuses in the cluster, to which charms can react.

Essentially, very high level, think Juju as a pub-sub implementation of application deployment. Every action inside or outside of the cluster posts a message to a common bus, and charms can react to these and perform additional actions, modifying the overall state… and so on and so on until equilibrium is reached.

Let’s add the relations:

juju add-relation kubernetes-master:certificates easyrsa:client juju add-relation etcd:certificates easyrsa:client juju add-relation kubernetes-master:etcd etcd:db juju add-relation flannel:etcd etcd:db juju add-relation flannel:cni kubernetes-master:cni for TYPE in cpu gpu gpu8 do juju add-relation kubernetes-worker-${TYPE}:kube-api-endpoint kubernetes-master:kube-api-endpoint juju add-relation kubernetes-master:cluster-dns kubernetes-worker-${TYPE}:kube-dns juju add-relation kubernetes-worker-${TYPE}:certificates easyrsa:client juju add-relation flannel:cni kubernetes-worker-${TYPE}:cni juju expose kubernetes-worker-${TYPE} done juju expose kubernetes-master

Note at the end the expose commands.

These are instructions for Juju to open up firewall in the cloud for specific ports of the instances. Some are predefined in charms (Kubernetes Master API is 6443, Workers open up 80 and 443 for ingresses) but you can also force them if you need (for example, when you manually add stuff in the instances post deployment)

Adding CUDA

CUDA does not have an official charm yet (coming up very soon!!), but there is my demoware implementation which you can find on GitHub. It has been updated for this post to CUDA 8.0.61 and drivers 375.26.

Make sure you have the charm tools available, clone and build the CUDA charm:

sudo apt install charm charm-tools # Exporting the ENV mkdir -p ~/charms ~/charms/layers ~/charms/interfaces export JUJU_REPOSITORY=${HOME}/charms export LAYER_PATH=${JUJU_REPOSITORY}/layers export INTERFACE_PATH=${JUJU_REPOSITORY}/interfaces # Build the charm cd ${LAYER_PATH} git clone cuda charm build cuda

This will create a new folder called builds in JUJU_REPOSITORY, and another called cuda in there.

Now you can deploy the charm

juju deploy --series xenial $HOME/charms/builds/cuda juju add-relation cuda kubernetes-worker-gpu juju add-relation cuda kubernetes-worker-gpu8

This will take a fair amount of time as CUDA is very long to install (CDK takes about 10min and just CUDA probably 15min).
Nevertheless, at the end the status should show:

juju status Model Controller Cloud/Region Version default aws-us-east-1 aws/us-east-1 2.1-rc1 App Version Status Scale Charm Store Rev OS Notes cuda active 2 cuda local 2 ubuntu easyrsa 3.0.1 active 1 easyrsa jujucharms 6 ubuntu etcd 2.2.5 active 1 etcd jujucharms 23 ubuntu flannel 0.7.0 active 6 flannel jujucharms 10 ubuntu kubernetes-master 1.5.2 active 1 kubernetes-master jujucharms 11 ubuntu exposed kubernetes-worker-cpu 1.5.2 active 3 kubernetes-worker jujucharms 13 ubuntu exposed kubernetes-worker-gpu 1.5.2 active 1 kubernetes-worker jujucharms 13 ubuntu exposed kubernetes-worker-gpu8 1.5.2 active 1 kubernetes-worker jujucharms 13 ubuntu exposed Unit Workload Agent Machine Public address Ports Message easyrsa/0* active idle 0/lxd/0 Certificate Authority connected. etcd/0* active idle 0 2379/tcp Healthy with 1 known peers. kubernetes-master/0* active idle 0 6443/tcp Kubernetes master running. flannel/0* active idle Flannel subnet kubernetes-worker-cpu/0 active idle 4 80/tcp,443/tcp Kubernetes worker running. flannel/4 active idle Flannel subnet kubernetes-worker-cpu/1* active idle 5 80/tcp,443/tcp Kubernetes worker running. flannel/2 active idle Flannel subnet kubernetes-worker-cpu/2 active idle 6 80/tcp,443/tcp Kubernetes worker running. flannel/3 active idle Flannel subnet kubernetes-worker-gpu8/0* active idle 3 80/tcp,443/tcp Kubernetes worker running. cuda/1 active idle CUDA installed and available flannel/5 active idle Flannel subnet kubernetes-worker-gpu/0* active idle 1 80/tcp,443/tcp Kubernetes worker running. cuda/0* active idle CUDA installed and available flannel/1 active idle Flannel subnet Machine State DNS Inst id Series AZ 0 started i-09ea4f951f651687f xenial us-east-1a 0/lxd/0 started juju-65a910-0-lxd-0 xenial 1 started i-03c3e35c2e8595491 xenial us-east-1c 3 started i-0ca0716985645d3f2 xenial us-east-1d 4 started i-02de3aa8efcd52366 xenial us-east-1e 5 started i-092ac5367e31188bb xenial us-east-1a 6 started i-0a0718343068a5c94 xenial us-east-1c Relation Provides Consumes Type juju-info cuda kubernetes-worker-gpu regular juju-info cuda kubernetes-worker-gpu8 regular certificates easyrsa etcd regular certificates easyrsa kubernetes-master regular certificates easyrsa kubernetes-worker-cpu regular certificates easyrsa kubernetes-worker-gpu regular certificates easyrsa kubernetes-worker-gpu8 regular cluster etcd etcd peer etcd etcd flannel regular etcd etcd kubernetes-master regular cni flannel kubernetes-master regular cni flannel kubernetes-worker-cpu regular cni flannel kubernetes-worker-gpu regular cni flannel kubernetes-worker-gpu8 regular cni kubernetes-master flannel subordinate kube-dns kubernetes-master kubernetes-worker-cpu regular kube-dns kubernetes-master kubernetes-worker-gpu regular kube-dns kubernetes-master kubernetes-worker-gpu8 regular cni kubernetes-worker-cpu flannel subordinate juju-info kubernetes-worker-gpu cuda subordinate cni kubernetes-worker-gpu flannel subordinate juju-info kubernetes-worker-gpu8 cuda subordinate cni kubernetes-worker-gpu8 flannel subordinate

Let us see what nvidia-smi gives us:

juju ssh kubernetes-worker-gpu/0 sudo nvidia-smi Tue Feb 14 13:28:42 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 On | 0000:00:1E.0 Off | 0 | | N/A 33C P0 81W / 149W | 0MiB / 11439MiB | 95% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

On the more powerful 8xlarge,

juju ssh kubernetes-worker-gpu8/0 sudo nvidia-smi Tue Feb 14 13:59:24 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 On | 0000:00:17.0 Off | 0 | | N/A 41C P8 31W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 On | 0000:00:18.0 Off | 0 | | N/A 36C P0 70W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 On | 0000:00:19.0 Off | 0 | | N/A 44C P0 57W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 On | 0000:00:1A.0 Off | 0 | | N/A 38C P0 70W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla K80 On | 0000:00:1B.0 Off | 0 | | N/A 43C P0 57W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla K80 On | 0000:00:1C.0 Off | 0 | | N/A 38C P0 69W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla K80 On | 0000:00:1D.0 Off | 0 | | N/A 44C P0 58W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla K80 On | 0000:00:1E.0 Off | 0 | | N/A 38C P0 71W / 149W | 0MiB / 11439MiB | 39% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

Aaaand yes!! We have our 8 GPUs as expected so 8x 12GB = 96GB Video RAM!

At this stage, we only have them enabled on the hosts. Now let us add GPU support in Kubernetes.

Adding GPU support in Kubernetes

By default, CDK will not activate GPUs when starting the API server and the Kubelets. We need to do that manually (for now).

Master Update

On the master node, update /etc/default/kube-apiserver to add:

# Security Context KUBE_ALLOW_PRIV="--allow-privileged=true" before restarting the API Server. This can be done programmatically with: juju show-status kubernetes-master --format json | \ jq --raw-output '.applications."kubernetes-master".units | keys[]' | \ xargs -I UNIT juju ssh UNIT "echo -e '\n# Security Context \nKUBE_ALLOW_PRIV=\"--allow-privileged=true\"' | sudo tee -a /etc/default/kube-apiserver && sudo systemctl restart kube-apiserver.service"

So now the Kube API will accept requests to run privileged containers, which are required for GPU workloads.

Worker nodes

On every worker, /etc/default/kubelet to to add the GPU tag, so it looks like:

# Security Context KUBE_ALLOW_PRIV="--allow-privileged=true" # Add your own! KUBELET_ARGS="--experimental-nvidia-gpus=1 --require-kubeconfig --kubeconfig=/srv/kubernetes/config --cluster-dns= --cluster-domain=cluster.local"

before restarting the service.

This can be done with

for WORKER_TYPE in gpu gpu8 do juju show-status kubernetes-worker-${WORKER_TYPE} --format json | \ jq --raw-output '.applications."kubernetes-worker-'${WORKER_TYPE}'".units | keys[]' | \ xargs -I UNIT juju ssh UNIT "echo -e '\n# Security Context \nKUBE_ALLOW_PRIV=\"--allow-privileged=true\"' | sudo tee -a /etc/default/kubelet" juju show-status kubernetes-worker-${WORKER_TYPE} --format json | \ jq --raw-output '.applications."kubernetes-worker-'${WORKER_TYPE}'".units | keys[]' | \ xargs -I UNIT juju ssh UNIT "sudo sed -i 's/KUBELET_ARGS=\"/KUBELET_ARGS=\"--experimental-nvidia-gpus=1\ /' /etc/default/kubelet && sudo systemctl restart kubelet.service" done Testing our setup

Now we want to know if the cluster actually has GPU enabled. To validate, run a job with an nvidia-smi pod:

kubectl create -f src/nvidia-smi.yaml Then wait a little bit and run the log command: kubectl logs $(kubectl get pods -l name=nvidia-smi -o=name -a) Tue Feb 14 14:14:57 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K80 Off | 0000:00:17.0 Off | 0 | | N/A 47C P0 56W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K80 Off | 0000:00:18.0 Off | 0 | | N/A 39C P0 70W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K80 Off | 0000:00:19.0 Off | 0 | | N/A 48C P0 57W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K80 Off | 0000:00:1A.0 Off | 0 | | N/A 41C P0 70W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 Tesla K80 Off | 0000:00:1B.0 Off | 0 | | N/A 47C P0 58W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 Tesla K80 Off | 0000:00:1C.0 Off | 0 | | N/A 40C P0 69W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 Tesla K80 Off | 0000:00:1D.0 Off | 0 | | N/A 48C P0 59W / 149W | 0MiB / 11439MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 Tesla K80 Off | 0000:00:1E.0 Off | 0 | | N/A 41C P0 72W / 149W | 0MiB / 11439MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

Ẁhat is intersting here is that the pod sees all the cards, even if we only shared the /dev/nvidia0 char device. At runtime, we would have problems.

If you want to run multi GPU containers, you need to share all char devices like we do in the second yaml file (nvidia-smi-8.yaml)


We reached the first milestone of our 3 part journey: the cluster is up & running, GPUs are activated, and Kubernetes will now welcome GPU workloads.

If you are a data scientist or running Kubernetes workloads that could benefit of GPUs, this already gives you an elegant and very fast way of managing your setups. But usually in this context, you also need to have storage available between the instances, whether it is to share the dataset or to exchange results.

Kubernetes offers many options to connect storage. In the second part of the blog, we will see how to automate adding EFS storage to our instances, then put it to good use with some datasets!

In the meantime, feel free to contact me if you have a specific use case in the cloud for this to discuss operational details. I would be happy to help you setup you own GPU cluster and get you started for the science!

Tearing Down

Whenever you feel like it, you can tear down this cluster. These instances can be pricey, hence powering them down when you do not use them is not a bad idea.

juju kill-controller aws/us-east-1

This will ask for confirmation then destroy everything… But now, you are just a few commands and a coffee away from rebuilding it, so that is not a problem.

Categories: Linux News

Control Plane Engineering Is Key for Big Kubernetes Deployments - Wed, 02/15/2017 - 09:00

If you’re interested in running a complex Kubernetes system across several different cloud environments, you should check out what Bob Wise and his team at Samsung SDS call “Control Plane Engineering.”

Wise, during his keynote at CloudNativeCon last year, explained the concept of building a system that sits on top of the server nodes to ensure better uptime and performance across multiple clouds, creates a deployment that’s easily scaled by the ClusterOps team, and covers long-running cluster requirements.

Categories: Linux News

Using Scripting Languages in IoT: Challenges and Approaches - Wed, 02/15/2017 - 08:49

Scripting languages (aka Very High-Level Languages or VHLLs), such as Python, PHP, and JavaScript are commonly used in desktop, server, and web development. And, their powerful built-in functionality lets you develop small useful applications with little time and effort, says Paul Sokolovsky, IoT engineer at Linaro. However, using VHLLs for deeply embedded development is a relatively recent twist in IoT.

Categories: Linux News

Intro to Control Plane Engineering by Bob Wise, Samsung SDS - Wed, 02/15/2017 - 08:39

Large, high-performance and reliable Kubernetes clusters require engineering the control plane components for demands beyond the defaults. This talk covers the relationship between the various components that make up the Kubernetes control plane and how to design and size those components.

Categories: Linux News

How and Why to do Open Source Compliance Training at Your Company - Wed, 02/15/2017 - 08:20
Title: How and Why to do Open Source Compliance Training at Your Company15 FebLearn more
Categories: Linux News

Whatever is an Unofficial Evernote App for Linux

OMG Ubuntu - Wed, 02/15/2017 - 08:05

Evernote is practically a by-word for being well organised and super productive — and not just in the minds of its 100 million users but among those who, like me, aspire to be. But with no official Evernote Linux app available it’s been left to the community to plug the productivity gap with unofficial alternatives, like NixNote, EverPad, NeverNote, and the Ubuntu Touch notes […]

This post, Whatever is an Unofficial Evernote App for Linux, was written by Joey Sneddon and first appeared on OMG! Ubuntu!.

Categories: Linux News

The Biggest Risk with Container Security is Not Containers - Wed, 02/15/2017 - 07:37

Container security may be a hot topic today, but we’re failing to recognize lessons from the past. As an industry our focus is on the containerization technology itself and how best to secure it, with the underlying logic that if the technology is itself secure, then so too will be the applications hosted.

Categories: Linux News

Futhark: A Pure, Functional Language For GPU Computing

Phoronix - Wed, 02/15/2017 - 06:47
Futhark was presented earlier this month at FOSDEM as a "purely functional array language" with its compiler able to "efficiently generate high-performance GPU code."..
Categories: Linux News


Subscribe to South Mississippi Linux User Group aggregator - Linux News