Thursday 15 August 2019

What is fishbucket in Splunk


In this post we will learn what is fishbucket in Splunk but before that lets us understand what Splunk is and its purpose.
Splunk is used for monitoring, searching, analyzing the machine data in real time. The datasource can range from application data to user created data to sensors or devices.

Purpose of Splunk fishbucket

Before analyzing the data, Splunk index the data. The index is necessary to analyze the data. But here is the issue, what if the same data is indexed multiple times or in other words, how to avoid duplicate indexing the same chunk of data?
Splunk fishbucket keeps seek pointers and CRCs for the indexed files inside the directory. This directory is called fishbucket. Since through fishbucket we can know which data has already been indexed, so splunkd can tell if it has been read already and avoid duplicate indexing.

How fishbucket works?

File monitor processor searches the fishbucket to see if the CRC from the beginning of the file is already there or not. This is the first step of file monitor processor whenever it starts looking at a file. There can be three possible scenarios:
Scenario 1: If CRC is not present is fishbucket, the file is indexed as new. This is simple, file has never been indexed. After indexing, it stores CRC and seekpointer inside fishbucket.
Scenario 2: If CRC is present is fishbucket and seek pointer is same as current end of file , this means the file has already been indexed and has not been changed since last indexed. Seek pointer is used to check if there is change in file or not.
Scenario 3: If CRC is present is fishbucket and seek pointer is beyond the current end of file, this means something in the part of file which we have already read has been changed. Since we cannot know what has been changed, lets re-index the whole data again.
Location of fishbucket directory
All these CRCs and seek pointer is stored in location by default:

Retention policy of fishbucket index

Via indexes.conf, we can change the retention policy of fishbucket index. This may be needed if we are indexing a lots of number of file. But we need to be careful when changing retention policy because if the file which has already been indexed but the CRCs and seek pointer got deleted due to change of retention policy, there is risk of same file getting indexed again.

Ways to track down a particular file when needed

If you need to know which file has been indexed and reindexed at which particular time, we can search all the events in the fishbucket associated with it by the file or source name. We can check seek pointer and mod time to know the required details.
We can also search fishbucket through GUI by searching for "index=_thefishbucket".

That's all for Splunk fishbucket. If you have any query, please mention in comment sections. Thanks.
Originally published at

Related Articles:
You may also like:

Tuesday 13 August 2019

Introduction to DevOps on AWS


Amazon Web Services(AWS) is a cloud service from Amazon, which provides services in the form of building blocks, these building blocks can be used to create and deploy any type of application in the cloud.

It is a comprehensive, easy to use computing platform. The platform is developed with a combination of infrastructure as a service (IaaS), platform as a service (PaaS) and packaged software as a service (SaaS) offerings.

Advantages of AWS for DevOps

There are many benefits of using AWS for Devops:

Get Started Fast

Each AWS service is ready to use if you have an AWS account. There is
no setup required or software to install.

Fully Managed Services

These services can help you take advantage of AWS resources
quicker. You can worry less about setting up, installing, and operating infrastructure on your own. This lets you focus on your core product.

Built for scale

You can manage a single instance or scale to thousands using AWS
services. These services help you make the most of flexible compute resources by
simplifying provisioning, configuration, and scaling.


You have the option to use each service via the AWS Command Line
Interface or through APIs and SDKs. You can also model and provision AWS resources
and your entire AWS infrastructure using declarative AWS CloudFormation templates.


AWS helps you use automation so you can build faster and more efficiently.
Using AWS services, you can automate manual tasks or processes such as deployments,
development & test workflows, container management, and configuration management.


Use AWS Identity and Access Management (IAM) to set user permissions and
policies. This gives you granular control over who can access your resources and how they
access those resources.

Buffer In Amazon Web Services

An Elastic Load Balancer ensures that the incoming traffic is distributed optimally across
various AWS instances.

A buffer will synchronize different components and makes the arrangement additional
elastic to a burst of load or traffic.

The components are prone to work in an unstable way of receiving and processing the
requests. The buffer creates the equilibrium linking various apparatus and crafts them effort at the identical rate to supply more rapid services.

Components of Amazon Web Services

Amazon S3

With this, one can retrieve the key information which are occupied in creating
cloud structural design and amount of produced information also can be stored in this
component that is the consequence of the key specified.

Amazon EC2 instance

Helpful to run a large distributed system on the Hadoop cluster.Automatic parallelization and job scheduling can be achieved by this component.

Amazon SQS

This component acts as a mediator between different controllers. Also worn
for cushioning requirements those are obtained by the manager of Amazon.

Amazon SimpleDB

Helps in storing the transitional position log and the errands executed
by the consumers.

How Spot instance different from an On-Demand instance or Reserved Instance

Spot Instance, On-Demand instance and Reserved Instances are all models for pricing.

Moving along, spot instances provide the ability for customers to purchase compute
capacity with no upfront commitment, at hourly rates usually lower than the On-Demand
rate in each region.

Spot instances are just like bidding, the bidding price is called Spot Price. The Spot Price
fluctuates based on supply and demand for instances, but customers will never pay more
than the maximum price they have specified.

If the Spot Price moves higher than a customer’s maximum price, the customer’s EC2
instance will be shut down automatically.

But the reverse is not true, if the Spot prices come down again, your EC2 instance will not
be launched automatically, one must do that manually.

In Spot and on demand instance, there is no commitment for the duration from the user
side, however in reserved instances one must stick to the time period that he has chosen.

Amazon Elastic Container Service (ECS)

Amazon Elastic Container Service (ECS) is a highly scalable, high performance container
management service that supports Docker containers and allows us to easily run
applications on a managed cluster of Amazon EC2 instances.

AWS Lambda in AWS DevOps

AWS Lambda lets us run code without provisioning or managing servers. With Lambda,
we can run code for virtually any type of application or backend service, all with zero

Just upload your code and Lambda takes care of everything required to run and scale your
code with high availability.

Amazon EC2 security best practices:

There are several best practices to secure Amazon EC2. A few of them are given below:

  • Use AWS Identity and Access Management (IAM) to control access to your AWS resources.
  • Restrict access by only allowing trusted hosts or networks to access ports on your
  • instance.
  • Review the rules in your security groups regularly, and ensure that you apply the
  • principle of least.
  • Privilege – only open up permissions that you require.
  • Disable password-based logins for instances launched from your AMI. Passwords can be found or cracked and are a security risk.