Importing data by using Logstash

This process is for advanced users. For a simpler approach, try importing data from a spreadsheet.

If you want to stream live data, such as logs, into the Elasticsearch cluster, then you can import data by using Logstash. You can also import CSV or JSON files in this way.

The following section contains an example of how to configure Logstash for importing data. You can adapt this example for use with your own data set.

The data sets used in the example contains millions of records. If you use these data sets, loading will take some time to complete.

Before you begin

  1. Install Logstash.

  2. (Optional) To walk through this example before using your own data, download the following publicly-available files:

    1. Download company data as one CSV file from this Web page.

    2. Download 'person of significant control' data as one JSON file from this Web page.

  3. Extract the .csv and .txt files.

  4. Open the example scripts and edit them to match the path and file names.

Creating configuration files

  1. Create a plain text file and enter the following content:

    input {
     file {
       path => "<location of BasicCompanyDataAsOneFile-date.csv>"
       start_position => beginning
     }
    }
    filter {
       csv {
    separator => ","
    autodetect_column_names => true
    autogenerate_column_names => true
       }
    }
    output {
       elasticsearch {
           hosts => ["127.0.0.1:9220"]
           index => "company"
       }
    }
  2. Edit the path to match the location of the .csv file and save it as logstash_csv.conf in the same path as the data set.

  3. Create another plain text file and enter the following content:

    input {
     file {
       type => "json"
       path => "<location of persons-with-significant-control-snapshot-date.txt>"
       start_position => beginning
     }
    }
    filter {
     json {
       source => "message"
     }
     mutate {
       uppercase => [ "data[name]" ]
     }
    }
    output {
       elasticsearch {
           hosts => ["127.0.0.1:9220"]
           index => "persons-control"
       }
    }
  4. Edit the path to match the location of the .txt file and save it as logstash_json.conf in the same path as the data set.

Loading the data

From a command prompt, navigate to the logstash/bin folder and run Logstash with the configuration files that you created. For example, run the following commands:

logstash -f C:\data\logstash_csv.conf
logstash -f C:\data\logstash_json.conf

To speed up the import process, you can install a second instance of Logstash and run the imports concurrently.

Next steps

Add relations between the entities in your data.