Write us what you want & improve the DevOpsCloud website for easy to use.

To stop spammers/bots in Telegram, we have added a captcha while joining the Telegram group, which means every new member, has to authenticate within 60 seconds while joining the group.

Home >>All Articles

Published Articles (117)

Sort by:
  • All |
  • AWS (52) | 
  • Azure (31) | 
  • DevOps (9) | 
  • FREE Udemy Courses (6) | 
  • GCP (1) | 
  • Linux (1) | 


Page 1 of 4 | Showing 1 to 30 of 117 entries
AVR posted:
12 months ago
What is Cloud?
Cloud is nothing, but we can get the Infrastructure for our project easily.
When we use Cloud, we don't have to invest money into servers/storage/manpower/electricity etc.

How to launch an EC2 instance in AWS?
EC2 is nothing but elastic cloud computing.
We need an EC2 instance to install applications and expose them to the outside world via port no.
Go to AWS Account
Click on EC2
Launch instance
Name: (Provide a meaningful name)
OS:(Pick an OS, and I would recommend Red Hat Linux)
Instance type:(As a part of learning, we could go with t2.micro, which is eligible for the free tier)
key pair: (create one as we need this to connect to the EC2 instance)
Once the instance is successfully launched, We can connect to the launched ec2 instance via the Git Bash terminal.
If you don't have Git Bash, you can download git bash for Windows.
We need to go to the path where our pem key is saved.
Go to the correct folder path where the pem key is located, and this is where we can execute the ssh command.
Go to the SSH client of the EC2 instance and get the ssh command.
Once we are connected to the EC2 instance via the Git Bash terminal, we can execute all the basic commands of Linux like
sudo -i
whoami
date
cal


How to install nginx on AWS EC2 Instance?
nginx and Apache are web servers, and their default port no 80
tomcat/web logic/WebSphere are application servers. Tomcat's default port no is 8080
We must execute the below commands in the Red Hat Linux EC2 instance.
yum install nginx
systemctl enable nginx
systemctl start nginx
systemctl status nginx


How to check the nginx in the browser?
Go to the browser and give publicip of the EC2 instance.
Browser default port no is 80 only; no need to give port no 80 separately
we need to open the 80 port in the security group as mandatory; if not, the Nginx would work in the browser
How can the security group be changed if port no 80 is not allowed as inbound?
Go to the appropriate security group
edit inbound rules
add rule
custom TCP    80    Anywhere
SAVEPlease note that we allow Anywhere only in the training sessions, not in the enterprise environment.
Once port no 80 is allowed as inbound
Go to the browser and give publicip of the EC2 instance.  
We should be able to see the Nginx landing page successfully.

How to stop Nginx?
systemctl stop nginx


How to start Nginx?
systemctl start nginx



How to install Apache on AWS EC2 Instance?
We must execute the below commands in the Red Hat Linux EC2 instance.
yum install httpd
systemctl enable httpd
systemctl start httpd
Please note that only one service can run on one port.
We need to ensure that no other services are running on port no 80, as Apache uses this port no.



How to see the list of executed all commands from the terminal?
history is the command we need to use to get the list of all executed commands.
Posted in: AWS | ID: Q118 |
May 08, 2023, 06:53 PM | 1 Replies
AVR posted:
1 year ago
Python contains the following data types.

Int
Float
Complex
Bool
Str
Bytes
Bytearray
Range
List
Tuple
Set
Frozenset
Dict
None
View replies (0)
Posted in: Python | ID: Q117 |
February 22, 2023, 11:31 PM | 0 Replies
AVR posted:
1 year ago
List of reserved words/keywords in python:

We have 33 reserved words in python

True, False, None (T, F and N are uppercase)
and, or, not, is
if, elif, else
while, for, break, continue, return, in, yield
try, except, finally, raise, assert
import, from, as, class, def, pass, global, nonlocal, lambda, del, with
View replies (0)
Posted in: Python | ID: Q116 |
February 22, 2023, 11:27 PM | 0 Replies
AVR posted:
1 year ago
What is a python identifier?
What is an Identifier?
A name in a python program is called an Identifier.
It can be a variable name, class name and method name etc.
a=10 here, a is the identifier
View replies (0)
Posted in: Python | ID: Q115 |
February 22, 2023, 11:25 PM | 0 Replies
AVR posted:
1 year ago
What do you know about the EditPlus editor?
This is the most commonly used editor for python.
Go to editplus.com and download Editplus version
Download exe file and install it by accepting the terms.
Once it is installed, open EditPlus and we're good to write the code.
View replies (0)
Posted in: Python | ID: Q114 |
February 22, 2023, 11:21 PM | 0 Replies
AVR posted:
1 year ago
What is AKS?
-------------------
AKS stands for Azure Kubernetes Service
We should understand the difference between Monolithic vs Microservices
Monolithic means the applications are tightly coupled.
If any changes happen on Monolithic, the entire application has to be down
Microservices is nothing but breaking down an application into multiple pieces
Mainly E-commerce websites use Microservices as MANDATORY
We should have a basic understanding of virtualization vs containerization
Every Microservice can be containerized
Containers are lightweight and portable
View replies (0)
Posted in: Azure | ID: Q113 |
February 22, 2023, 10:52 AM | 0 Replies
AVR posted:
1 year ago
How to write a program in notepad and save it locally?
--------------------------------------------------------------------------
Write the code
and save the file as All Files.
The file name could be anything but with the extension(.py)
Now
How to run this program?
Go to the file location from cmd prompt and run the below command
python filename.py
We get the output automatically
View replies (0)
Posted in: Python | ID: Q112 |
February 22, 2023, 10:47 AM | 0 Replies
AVR posted:
1 year ago
What is IDLE?
Integrated Development Learning Environment
It is a REPL tool responsible for reading, evaluating, printing and loop
R-Read
E-Evaluate
P-Print
L-Loop
View replies (0)
Posted in: Python | ID: Q111 |
February 22, 2023, 10:45 AM | 0 Replies
AVR posted:
1 year ago
Python Installation:
---------------------------
Go to the official website -> www.python.org
Download the latest python exe file for Windows
Run the exe file
We need to add(python.exe) to PATH(This comes as a part of the Installation step)
How to validate python installation?
Go to the cmd prompt and type the keyword "python"
View replies (0)
Posted in: Python | ID: Q110 |
February 22, 2023, 10:44 AM | 0 Replies
AVR posted:
1 year ago
Features of python:
--------------------------
Simply and easy to learn
Open source
High-level language
Platform independent
Dynamically typed(Type refers to data types)
Procedure Oriented and Object Oriented
Extensive libraries are available for developers to build applications.
View replies (0)
Posted in: Python | ID: Q109 |
February 22, 2023, 10:42 AM | 0 Replies
AVR posted:
1 year ago
What is python?
Python is a programming language.
Python is widely used in the IT Industry today.
We can develop applications using a programming language.
It is a high-level language
High-level languages are developer friendly.
View replies (0)
Posted in: Python | ID: Q108 |
February 22, 2023, 10:42 AM | 0 Replies
Nagaraj posted:
1 year ago
can i get a help for python scripting for automating devops implementation?
View replies (0)
Posted in: Python | ID: Q107 |
December 31, 2022, 05:18 PM | 0 Replies
AVR posted:
1 year ago
What do you know about the s3 bucket in AWS?
Amazon S3 stands for Amazon Simple Storage Service.
We can store files/images/videos/log files/war files/objects etc
S3 bucket size is 5TB
We can create approximately 100 S3 buckets. Beyond this, we may need to raise a service request with AWS.
What are the advantages of an S3 bucket?
All IT Companies can use S3 bucket with DevOps and without DevOps
S3 is just storage where IT companies have the flexibility to implement and start using it.
Every company would have some data, and they have to store it somewhere, and this is where they could use AWS S3 Service.
The S3 bucket name MUST be unique globally
S3 bucket name MUST be with lowercase letters
The S3 bucket can be accessed globally
S3 also have life cycle policies that we use to reduce the billing as per the business need.
S3 standard is expensive, and this is for everyday usage. This is like instant download; we can download the files instantly without waiting.
S3 Glacier is inexpensive as this can be used once in a while, like every 3 months/every 6 months. Download won't happen immediately, as downloading the files may take an hour or a few hours.
View replies (0)
Posted in: AWS | ID: Q106 |
November 01, 2022, 12:19 AM | 0 Replies
AVR posted:
1 year ago
Let's learn something about MFA(Multi-factor authentication)
Go to AWS Account at header menu - click on Security Credentials
This is where we can see the MFA option
Click on MFA, where we can see the option "Activate MFA"
Click on Activate MFA
Select the option(Virtual MFA device)
Click on Continue
Click on Show QR Code
We need to scan this code from the Mobile, where we get code 1 and code 2
We need to download Microsoft Authenticator App
From the above App, we need to scan the QR code
Enter the code 1
Enter the code 2
Now click on Assign MFA
How to test the MFA?
Sign in back from the console
This is where it asks MFA code after the password to login
Why are we activating MFA? Due to enhanced security, companies have started using this MFA as mandatory.

How to delete MFA?
Go to AWS Account at header menu - click on Security Credentials
This is where we can see the MFA option
Click on MFA, where we can see the option "Manage"
If we want to remove this, then click on the Remove option
View replies (0)
Posted in: AWS | ID: Q105 |
November 01, 2022, 12:17 AM | 0 Replies
AVR posted:
1 year ago
How to delete IAM users in AWS?
Go to IAM
Click on users
Select the appropriate users
Click on the Delete button
If any prompt comes, follow the AWS instructions
View replies (0)
Posted in: AWS | ID: Q104 |
November 01, 2022, 12:15 AM | 0 Replies
AVR posted:
1 year ago
What is IAM & What do you know about IAM in AWS?

IAM stands for Identity Access Management
Let's assume that we have 100 users in the company, and all will access only one AWS Account.
There could be two accounts also depending on the environments and how their infrastructure has been planned
Now the question is how the access would be granted to the users.
Some people may need only access to the s3/ec2/load balancer. Not everyone needs full access to AWS.
Now we need to learn how to restrict the user or users with roles
IAM is the one who helps with this requirement

Search for IAM
We can see the IAM dashboard
Left Menu - Click on Users
Click on Add users
username-chak
Select AWS credential type- We have two options, and we can select both checkboxes
Programmatic Acess is nothing to access AWS from the command line, not GUI. This is where we use the access key and secret key.
Click on Next permissions.
Click on Create group.
Group name - devgroupaccess
Search for s3 as a keyword
Select AmazonS3FullAccess
Here I'm giving only S3FullAccess.Other than this, users cannot access anything else.
Click on Create group
Click on the Next tags
(Key, Value) we can specify anything as these are just tags (name IAM)
Click on Review
Click on Create user
On the confirmation page, we can see the sign-in URL and Download.csv option.
Now the user can log in with credentials.

NOTE:
For the root user, we don't need an Account ID in the URL. The root user is nothing but the Admin in the company.
For a normal user, we need an Account ID in the URL
When a normal user signs in as an IAM user, it asks the below fields as MANDATORY.
Account ID
IAM user name
Password
Users must change the password at the time of first-time login as per the policy.

How to give AmazonEC2FullAccess to the normal user?
Go to the Admin/Root user account
Go to IAM
Go to Users - click on the correct user where we need to grant permissions
Click on the Groups tab
Click on the Group name it is assigned
Click on the Permissions tab
Click on Add permissions
Click on Attach policies
Now search for the role "AmazonEC2FullAccess"
Click on Add permissions
The group permissions have been updated, and the user can get the newly added role as expected.
View replies (0)
Posted in: AWS | ID: Q103 |
November 01, 2022, 12:14 AM | 0 Replies
AVR posted:
1 year ago
How to install Jenkins on AWS EC2 Linux Machine?
Go to AWS
Launch an Instance
Name-Provide a name
OS Image- I pick Red Hat Linux OS
Instance type - t2.micro for learning purposes only
Key paid - Yes, we need it
Security Group - Yes, we need it as Mandatory
SSH 22 should open
Custom TCP 8080 should also open for Jenkins
Launch the Instance and wait patiently to see the Instance state as Running
Connect to EC2 Machine using an SSH client
Pick the ssh command provided by the AWS
Open Git Bash -
Go to the location where pem file is downloaded
cd downloads
paste the ssh command and get connected to EC2 Red Hat Linux machine
$ says we are logged in with a normal user
sudo -i
Now we can see # as the user is the root user

Let's install java as a part of the Tomcat installation
yum install wget zip unzip -y
JDK 8u131 is the version which has been tested at my end.
stackoverflow.com/questions/10268583/downloading-java-jdk-on-linux-via-wget-is-shown-license-page-instead
wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.rpm
We have downloaded jdk-8u131-linux-x64.rpm
Once it is downloaded, use ls command
ls
rpm -ivh jdk-8u131-linux-x64.rpm
java -version
wget dlcdn.apache.org/tomcat/tomcat-8/v8.5.83/bin/apache-tomcat-8.5.83.zip
Once it is downloaded, use ls command
ls
unzip apache-tomcat-8.5.83.zip
ls
mv apache-tomcat-8.5.83 /opt/tomcat
ls
How to start the Tomcat server?
chmod -R 755 /opt/tomcat (Here we are granting appropriate file permissions) Here, R is Recursive, and 755 permissions are meant for users, groups and others.
7 means 4+2+!(Read+Write+Execute)
/opt/tomcat/bin/startup.sh
This should start the Tomcat server as expected
Now we can go to the browser and test the Tomcat page
Go to the browser and type EC2PUBLICIP:8080
We should get the Tomcat landing page as expected.
If we don't get the tomcat page, we need to ensure port no 8080 is open in the ec2 machine security settings.
Click on the active security group.
Click on edit inbound rules
Click on Add rule
Custom TCP should be Type, and Port No should be 8080
We must save the rules


How to check the status of Tomcat?
sudo -i
ps -ef | grep tomcat
grep stands for Global regular expression print
Once the Tomcat is up and running, it shows all the default configuration files
How to stop the Tomcat?
/opt/tomcat/bin/shutdown.sh
ps -ef | grep tomcat
Now we don't see any default configuration files as we stopped the tomcat
Let's start the tomcat server
/opt/tomcat/bin/startup.sh
ps -ef | grep tomcat


How to deploy (jenkins.war) in Tomcat?
updates.jenkins.io/download/war/
Never use new versions and always go with older versions
Let's pick 2.354 version for the time being
updates.jenkins.io/download/war/2.354/jenkins.war (This is the original link)
wget updates.jenkins.io/download/war/2.369/jenkins.war
ls
cp jenkins.war /opt/tomcat/webapps/
Now go to the browser and type EC2PUBLICIP:8080/jenkins
If we see any problems, stop the tomcat and start the tomcat server
/opt/tomcat/bin/shutdown.sh
/opt/tomcat/bin/startup.sh
If the Jenkins war version is not working, we can delete
cd /opt/tomcat/webapps/
ls
rm -rf jenkins jenkins.war (Here we are deleting both files jenkins and jenkins.war)
ls
We need to unlock Jenkins with initialAdminPassword
Go to Terminal
and use the below command
cat /root/.jenkins/secrets/initialAdminPassword
Copy the password and paste it at Jenkins landing page to Unlock
Click on continue
Click on Install suggested plugins(Here we are customizing Jenkins)
Now the question is, why do we need plugins?
To work Ansible/Docker/Kubernetes/Terraform, we should have appropriate plugins in Jenkins for integration purposes which makes the work easier.
We need plugins to configure the connectivity
Create First Admin User
Click on Start using Jenkins
Now we can see Jenkins Dashboard
As an example
Click on New Item
Enter an item name
We can see many options as Freestyle project/Pipeline/Multi configuration project.
Let's select the Freestyle project
Go to Build - Add build step - Execute shell
In the command box, we can simply give the date as a keyword
Click on Apply and Save
Click on Build Now on Left Menu
Left side, click on the tick mark, which shows the Console Output
We can also check the build in the Gitbash terminal
ls -ltra
cd .jenkins
ls
cd workspace
ls
Here we can see the project name
If we build any related package, we can see more details here.
View replies (0)
Posted in: DevOps | ID: Q102 |
October 31, 2022, 02:28 AM | 0 Replies
AVR posted:
1 year ago
What do you know about web servers and application servers?
Examples of web servers are Nginx or apache.
Examples of Application servers are Tomcat/Weblogic/Websphere.
What is the difference between a web server and an application server?
a webserver is nothing but a static one with read-only content




How to install nginx web servers?
We need to execute the below commands
sudo -i
yum install nginx -y
systemctl enable nginx
systemctl start nginx
systemctl status nginx
It should be active and running
Go to browser and type EC2PUBLICIP or EC2PUBLICIP:80(Browser default port no is 80, so we don't need to give 80 port any intentionally)
We should get nginx landing page
systemctl stop nginx
Now the nginx landing page should not work as expected.





How to install apache web servers?
We need to execute the below commands
sudo -i
yum install httpd -y
systemctl enable httpd
systemctl start httpd
Here we need to make sure that only either nginx or apache is running on port no 80 as both cannot run at a time with the same port no
If nginx is running on port no 80, we need to stop nginx 1st and then start the apache web server
systemctl status httpd
It should be active and running
Go to browser and type EC2PUBLICIP or EC2PUBLICIP:80(Browser default port no is 80, so we don't need to give 80 port any intentionally)
We should get apache landing page
systemctl stop httpd
Now the apache landing page should not work as expected.




How to install the tomcat application server?
To install tomcat, the pre-requisite is java
1st we need to install java as mandatory
yum install wget zip unzip -y
JDK 8u131 is the version which has been tested at my end.
stackoverflow.com/questions/10268583/downloading-java-jdk-on-linux-via-wget-is-shown-license-page-instead
wget -c --header "Cookie: oraclelicense=accept-securebackup-cookie" download.oracle.com/otn-pub/java/jdk/8u131-b11/d54c1d3a095b4ff2b6607d096fa80163/jdk-8u131-linux-x64.rpm
We have downloaded jdk-8u131-linux-x64.rpm
Once it is downloaded, use ls command
ls
rpm -ivh jdk-8u131-linux-x64.rpm
java -version
wget dlcdn.apache.org/tomcat/tomcat-8/v8.5.83/bin/apache-tomcat-8.5.83.zip
Once it is downloaded, use ls command
ls
unzip apache-tomcat-8.5.83.zip
ls
mv apache-tomcat-8.5.83 /opt/tomcat
ls
How to start the Tomcat server?
chmod -R 755 /opt/tomcat (Here we are granting appropriate file permissions)
/opt/tomcat/bin/startup.sh
This should start the Tomcat server as expected
Now we can go to the browser and test the Tomcat page
Go to the browser and type EC2PUBLICIP :8080
We should get the Tomcat landing page as expected
If we don't get the tomcat page, we need to ensure port no 8080 is open in the ec2 machine security settings.
Click on the active security group.
Click on edit inbound rules
Click on Add rule
Custom TCP should be Type, and Port No should be 8080
We must save the rules
View replies (0)
Posted in: DevOps | ID: Q101 |
October 30, 2022, 05:05 AM | 0 Replies
AVR posted:
1 year ago
What is EC2?
EC2 is nothing Elastic Cloud Computing
It is nothing but an instance or VM.

How to launch Instance?
Click on the launch Instance
Name is Mandatory
Application and OS images- We need to select the appropriate OS(As an example: I pick Red Hat)
Instance type - (t2.micro) is only meant for learning purposes, and this one we cannot use in the Enterprise environment.
Key pair - We need to create one which is used for authentication(this is for login purposes)
Click on create new key pair
Provide a name
We have two formats(One is .pem, and the other one is .ppk)
.pem is for use with OpenSSH
.ppk is for use with PuTTY
Here I select .pem
Click on Create key pair (Automatically .pem file gets downloaded locally)
Network settings - We should understand what VPC is and how VPC is working internally
AWS gives default VPC for self-learning
We also have a security group which allows port numbers
For example
Web servers' default port no is 80
To connect to the Linux machine port, no 22 should be opened
Security group name - Specify a meaningful name
SSH stands for Secure Shell
TCP stands for Transmission Control Protocol
Source type - Anywhere (In companies, there would be a range where the connectivity happens only from those given IPs)
We can always add more security group rules as needed.
Custom TCP is the Type. Here I give 80 as the Port range.The source type is Anywhere or IP range or My IP
Next
Storage is 10 GB or Max 30GB for Linux machines as a part of self-learning which is FREE to use.
Click on Launch Instance
Click on view all instances
Now we can see our EC2 Instance up and running
How to connect to an EC2 machine?
Select the EC2 Instance and click on Connect button
Click on SSH client, where we can see all the instructions given by the AWS
AWS also provides an example for beginners who can understand easily.
The format looks like as below
ssh -i "NAME OF THE PEM FILE" username@ec2instancename
We can use the Git bash terminal to connect to EC2 Machine
Below are the basic commands to play around with in the Git Bash terminal
pwd is a command - present working directory
Here the catch is we need to go to the location where the .pem file is downloaded
To go there
cd downloads - this command takes us to the location where we saved the .pem file
Now paste the command here to connect to the EC2 Linux machine
Once successfully connected, we get the prompt with the username and private IP of the EC2 machine
This is just an example of how it looks like
[ec2-user@ip-172-31-41-51 ~] $
$ says the user is a normal user
We can verify this with the command whoami
If we want to switch from normal user to sudo user
sudo -i
Now we can see #
# says the user is root user
we can try few command like
whoami
date
cal
All the Linux commands should work as expected here
If we exit 2 times, we come out of the session
We can also log in with the below command
ssh -i pemfile.pem ec2-user@ec2instancepublicip
View replies (0)
Posted in: AWS | ID: Q100 |
October 30, 2022, 05:02 AM | 0 Replies
AVR posted:
2 years ago
SPARK ARCHITECTURE:
------------------------------------
There are 5 parts to it
->Driver program
->Cluster Manager
->Worker Node
->Executor
->Task

Driver program:
---------------------
Driver Program in the Apache Spark architecture calls the main program of an application and created SparkSession.
A SparkSession consists of all the basic functionalities.
Spark Driver contains various other components such as DAG Scheduler, Task Scheduler, and Backend Scheduler which are responsible for translating the user-written code into jobs that are actually executed on the cluster.
Job is split into multiple smaller tasks which are further distributed to worker nodes and can also be cached there.
SparkDriver and SparkSession collectively watch over the job execution within the cluster.
SparkDriver works with the Cluster Manager to manage various other jobs.

Cluster Manager:
-----------------------
The role of the cluster manager is to allocate resources across applications. The Spark is capable enough of running on a large number of clusters.
It consists of various types of cluster managers such as Hadoop YARN, Apache Mesos, and Standalone Scheduler

Worker Node:
------------------
The worker node is a slave mode
Its role is to run the application code in the cluster.

Executor:
-------------
An executor is a process launched for an application on a worker node.
It runs tasks and keeps data in memory or disk storage across them.
It read and writes data to the external sources
Every application contains its executor.

Task:
-------
A unit of work that will be sent to one executor.
View replies (0)
Posted in: Azure | ID: Q99 |
May 07, 2022, 10:30 AM | 0 Replies
AVR posted:
2 years ago
What do you know about Spark Architecture?

SPARK ARCHITECTURE:
-----------------------------------
There are 5 parts to it and let's understand how they interconnect each other internally
->Driver program
->Cluster Manager
->Worker Node
->Executor
->Task

A user writes the code and submits it to the Driver program and this is where the Spark Session gets started
Spark Session establishes the communication between the Driver program, Cluster manager & Worker nodes.
Driver Program has the code and asks the Cluster Manager to get the work done with the help of worker nodes
Now Cluster Manager would bring some worker nodes
We can have internally 1 worker node or 2 worker nodes or 3 worker nodes etc.
Typically Worker Node consists of Executors and Tasks
Driver Program asks the Cluster Manager to launch the workers.
The cluster Manager's responsibility is only to monitor the workers or worker nodes.
Once the Cluster Manager notices the active worker nodes it informs back to the Driver program
Now Driver Program gives the actual work to Worker Nodes directly
The executor executes the job and gives the result back to the Driver program.
Finally, the Driver program returns the end result to the user 
View replies (0)
Posted in: Azure | ID: Q98 |
May 07, 2022, 10:28 AM | 0 Replies
AVR posted:
2 years ago
Real-time data processing vs. Batch processing:
----------------------------------------------------------------
->Real-time data collects the data and processes the data on the same day/immediately
->Collection & processing happens on the same day in Real-time
->Batch processing collects the data today and processes them tomorrow or the next day
->Batch processing has one day delay in processing the data
->Batch processing could be daily/weekly/monthly/quarterly in processing the data
->Every business has to adopt the new changes; if not, they can run the business in today's world
->Every customer looks for new features & it has to be done directly from the Mobile
->If the website is not working/mobile app is not working, we all know how business could get affected.
View replies (0)
Posted in: Azure | ID: Q97 |
May 07, 2022, 10:25 AM | 0 Replies
AVR posted:
2 years ago
Why are we using spark when map-reduce was there in the IT Industry?
-----------------------------------------------------------------------------------------------
The drawbacks of map-reduce are as follows:
-> Here, reading the data from the hard disk where as spark it reads the data from the memory
->Reading data from memory is always highspeed
->All the calculations happen in memory in any computer; it doesn't happen in harddisk
->Even though the data is in a hard disk, it brings the data from the hard disk to memory and kicks off the operation in memory
->Memory is costly when compared to harddisk, but still, companies prefer In-Memory to process the data
->Map reduce does the operation in the harddisk only
->Map reduce uses harddisk
View replies (0)
Posted in: Python | ID: Q96 |
May 07, 2022, 10:15 AM | 0 Replies
AVR posted:
2 years ago
What is the spark?
Why do we need a spark?
Why organizations are choosing spark to transform the data other than map-reduce/ADF?

->spark is in-memory
->Big data processing tool
->It can process real-time data along with the batch
->High speed, faster than map-reduce
->Distributed processing engine
->Parallel operations
->Lazy Evaluation
->Resilient
->Fault tolerance
->partitioning
View replies (0)
Posted in: Python | ID: Q95 |
May 07, 2022, 10:13 AM | 0 Replies
AVR posted:
2 years ago
What is logging in Python?

When we use python code in Databricks to develop spark code, we need to use the python logging module to implement logging
What is the use of logging?
Why do we need to collect the logs?
Why do we need logs?
If any failures happen in the process/jobs, then this is where we need logs to fix any issues
What went wrong?
What is the error message?
We can quickly troubleshoot based on the error logs and appropriate action can be taken if necessary
Also,
while we are inserting data into database tables, this information also can be logged into the logging
We can also log the queries which got executed while inserting the data into tables
Logs are also used to improve the functionality
There are lots of advantages if we collect the logs in any process
Logging can be done at all the levels in any application development
Java developer
Python developer
Web Application developer
Data Engineer
Data Scientist
Every application/process should collect the logs as MANDATORY in any enterprise environment


import logging
logging. debug('This is a debug message') - This is Level 1
logging. info('This is an info message') - This is Level 2
logging. warning('This is a warning message') - This is Level 3
logging. error ('This is an error message') - This is Level 4
logging.critical('This is a critical message') - This is Level 5
View replies (0)
Posted in: Python | ID: Q94 |
May 07, 2022, 10:11 AM | 0 Replies
AVR posted:
2 years ago
What is File Handling in Python?
We manage the files
Why do we need to manage the files?
What is there in the files?
The normal file would have data in general

storages(blob storage account/data lake) - we store data in the files format
databases(sql/oracle/mysql) - we store data in the tables format


For reading files,
we could use plain python syntax
or
we could also use pyspark syntax


pyspark syntax is different and python syntax is different for reading/writing the files



If we want to read a file- then the file must be available
If we want to write data into a file - then the file could be available/not available
if available we could write the data
if not available we could create the file and write the data



What are the modes:
----------------------------
w is meant for write mode (File may be available or may not be available- File can be created at run time if it is not available)
a is meant for append mode (File may be available or may not be available- File can be created at run time if it is not available))
r is meant for read mode (File must be available)



What is the difference between write mode & append mode?
==============================================
write mode is nothing but overwriting - deleting the existing date and rewriting the new data
append mode is nothing but writing the data to the existing data without deleting the current data


If the file is available with some data - write mode removes everything and writes the new data
If the file is available with some data - append mode just adds the data without deleting the existing data
View replies (0)
Posted in: Python | ID: Q93 |
May 07, 2022, 10:08 AM | 0 Replies
AVR posted:
2 years ago
What do you know about Exception handling in python?

Exception handling would be there in all programming languages like
.Net
java
python
SQL
scripting(shell/bash)

Why do we need exception handling?
Exceptions are designed in such a way that the code shouldn't break if something is not available or if something is not working as expected

Exceptions are two types:
->System raised exception
->User raised exception
Typically if the run gets succeeded then only it places the file in the storage account.
If the previous run got failed, then the failed job wouldn't place any file in the storage account.
The next run wouldn't find the previous job file and eventually, this run gets failed too
Run2 is dependent on the Run1 file
File not found exception is what we could see


How to make use of exception handling without failing Run2?
===========================================
Exception handling is nothing but we're not fixing the issue
We're not placing any temporary file to run the job
but
you are redirecting the flow to make a smooth execution
Without interrupting the flow of execution,
If the file is not located in the so and so location, then pls go to the archive folder and pick the backup file to read and run the job
We have the main folder & archive folder
the main folder is the place where the original file must be
archive folder is the place where we keep the backup file to run the job to avoid failures

NOTE:
====
raise is a keyword
this is what we use to raise our own exception
Go to python Environment and type the word " keywords"
We can see all reserve words where raise is a part of it
try is another keyword
except is another keyword
else is another keyword
finally is another keyword





The sequence is as follows:
===================
try block-
This is a mandatory block in the exception handling
We need to keep risky code in the try block
Always try will execute as MANDATORY when we implement exception handling

except block -
When except block will execute?
Will this execute always?
Whenever an exception happens at try block then only the execution comes to except block
except block would get executed only if try block encounter an exception
except block is NOT mandatory
Exception handling can also be handled without except block
We need to maintain alternative logic or alternative files

else block -
This is not a mandatory block
This is meant for printing successful messages
else executes only when there is NO EXCEPTION raised in the entire application code
If the try block doesn't have an exception then else block gets executed automatically
If the try block has an exception then else block wouldn't get executed

finally, block -
This is not mandatory
This is for closing all database or storage connection
This executes always

raise block -
This is used to raise customer/user-defined exceptions




Regular error messages are as follows:
========================
->syntaxerror: unmatched
->unsupported operand type(s) for 'int' and 'str'
->An exception occured name 'x' is not defined
->Variable x is not defined
->unexpected indent




How to check mount points?
======================
dbutils.fs.mounts()
View replies (0)
Posted in: Azure | ID: Q92 |
May 07, 2022, 10:04 AM | 0 Replies
AVR posted:
2 years ago
Let's understand the Data flow architecture in Azure databricks.
We need to have good knowledge of how different components are connected and internally what happens when creating a databricks cluster.

Microsoft is providing Azure Databricks which means servers/storage will be used from the Microsoft Datacenter.
Likewise,
AWS is providing AWS Databricks which means computer/storage will be used from the AWS Datacenter.

What is the difference between an All-purpose cluster & job cluster?
The all-purpose cluster used in the development environment
job cluster used in the production environment


What is the difference between real-time data processing and batch data processing?
Real-time data processing is processing a negligible amount of data. If the data is very small and the processing takes a minute or two then we can consider this as real-time data processing.
What is Batch data processing?
If we are collecting one-hour data to process then we can call this an hourly batch data processing(small job)
If we are collecting twenty-four hours of data to process then we can call this a daily batch data processing(big job)
If the batch data processing is five minutes or ten minutes then we can call this small-batch data processing(a very small job)
View replies (0)
Posted in: Azure | ID: Q91 |
April 18, 2022, 11:21 AM | 0 Replies
AVR posted:
2 years ago
What is Databricks & What do you know about Databricks?

Databricks is a new analytics service.
Azure databricks is a fast, easy, scalable, and collaborative apache-spark based analytics service on azure.

Why do we call it Fast? Because it uses a spark cluster
Why do we call it Easy? - We don't need any eclipse like PyCharm/Visual Studio to write the code
Why do we call it Scalable? - Dynamic allocation of the resources as per the requirement(nodes) is possible - We always need more nodes to process more data in databricks.
What is collaborative? - Data engineers/Data scientists/business users can work in Databricks notebook as collaborative work. Instead of working isolated, they all work in Databricks to achieve better productive work.
We can seamlessly connect from Databricks to other azure services(datalake/blob storage account/SQL server/azure synapse). Reduces cost and complexity with a managed platform that auto-scaled up and down

Let's understand more about Azure Databricks Architecture
Once Databricks Workspace is created, we have the flexibility to create clusters
We also have a flexibility option to upload the data via DBFS though this is NOT recommended at the enterprise level considering the security as a high priority.
DBFS is a databricks file system.
When we store the data internally via DBFS, it gets stored backend in the storage account depending on the cloud we choose(AWS/AZURE).
If we choose AWS, then EC2 Instance would spin up and data gets stored internally at AWS S3.
If we choose AZURE then VM would spin up and data gets stored internally at the Blob storage account.
Databricks knows all the dependencies at the time of workspace creation. It creates all the pre-requisites that are needed for Databricks workspace.
Databricks cluster is nothing but a group of VMs.
When we create a cluster, VMs get created at the backend in the Azure Portal.
In order to run the notebook, we need to have a databricks cluster in place. We need to attach the notebook to the cluster to run the notebook where the notebook code gets executed.

Databricks has got 2 options. One is auto-scaling and the other one is to Terminate the cluster due to cluster inactivity. These two options are very helpful to reduce the cost.
View replies (0)
Posted in: Azure | ID: Q90 |
April 18, 2022, 11:19 AM | 0 Replies
AVR posted:
2 years ago
How do we create a notebook in databricks, and what are our options while creating a notebook?
We can create a notebook using any of the below languages.
python
scala
SQL 
When we create a notebook with python as an example, we also have an option to change that to another language scala after creating.
We have the flexibility to switch from one language to another.


Let's understand more about the databricks cluster.
What is a databricks cluster & why do we need a databricks cluster?
Let's assume that we have some data in the storage, and we need to process this data. To process this data with some ETL operations, we need some computing power to execute the notebook logic, and this is where the cluster would come in place.
A cluster is nothing but a group of nodes.
Apache spark clusters with multiple nodes have spark installation and spark features. They all work together internally to achieve a common goal.
1DBU means one node
2DBU means two nodes
8DBU means eight nodes
Nodes are used to execute the notebook code
DBU stands for Databricks unit.

Below are the options we have while creating a databricks notebook:
Clone
Rename
Move
Delete
Upload Data
Export
Publish
Clear Revision History
Change Default Language

When we upload files in databricks workspace we have two types of formats. Spark API Format & File API Format.
How to Access Files from Notebooks?
PySpark - Code can be generated easily through UI.
pandas - Code can be generated easily through UI easily.
R - Code can be generated easily through UI easily.
Scala - Code can be generated easily through UI easily.


Below is an example of how we can read CSV files from databricks filesystem?
df1 = spark.read.format("csv").option("header","true").load("dbfs:/FileStore/tables/credit.csv")
df1.show()


What are the formats we have while exporting databricks notebooks?
DBC Archive format comes as (.dbc) - This is an archived format that is not easy to read/understand
Source File format comes as (.py) assuming that we are exporting python notebook - This is easy to read/understand
IPython Notebook format comes as (.pynb) - This is readable but again not in the proper format to understand
HTML format comes as (.html) - This is easy to read but again not in the proper format to understand


How do we import notebooks from the databricks workspace?
Go to the workspace - Make use of the import option wherever we want
We can import only one at a time as per the current databricks standards.


Publish notebook:
When a notebook is published we get a URL that is accessible publicly
The link will remain valid for 6 months in general
Go to browser - paste the URL - the link should work as expected


Clear revision history:
All notebook code changes get recorded and this is what we call versioning.
We can always go back to a version history to see all the code changes.
Versioning plays a major role in any development activity in general.
Versioning helps when something goes wrong with new version changes- This is where Rollback comes in place.


Change Default Language
Python/Scala/SQL/R
We can change the language to any of the above using the default language options given by the Databricks


Clear cell outputs are only to clear the outputs but the state is still active if the cells get executed
A clear state means the state is all cleared and there were no values stored unless the cells get reexecuted manually 
View replies (0)
Posted in: Azure | ID: Q89 |
April 17, 2022, 11:02 PM | 0 Replies
Page 1 of 4 | Showing 1 to 30 of 117 entries