Need to edge out the competition for your dream job? Train for certifications today.
Experts Exchange Solution brought to you by
"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.
use the scroll/scan API to find all matching ids and then issue a bulk request to delete them.
If your search requirements allow it, there is some room for optimization in the mapping definition of your index:
By default, Elasticsearch stores the original data in a special _source field. If you do not need it, disable it.
By default, Elasticsearch analyzes the input data of all fields in a special _all field. If you do not need it, disable it.
If you are using the _source field, there is no additional value in setting any other field to _stored.
If you are not using the _source field, only set those fields to _stored that you need to. Note, however, that using _source brings certain advantages, such as the ability to use the update API.
For analyzed fields, do you need norms? If not, disable them by setting norms.enabled to false.
Do you need to store term frequencies and positions, as is done by default, or can you do with less – maybe only doc numbers? Set index_options to what you really need, as outlined in the string core type description.
For analyzed fields, use the simplest analyzer that satisfies the requirements for the field. Or maybe you can even go with not_analyzed?
Do not analyze, store, or even send data to Elasticsearch that you do not need for answering search requests. In particular, double-check the content of mappings that you do not define yourself (e.g., because a tool like Logstash generates them for you).
Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.
When a document is deleted or updated (= delete + add), Apache Lucene simply marks a bit in a per-segment bitset to record that the document is deleted. All subsequent searches simply skip any deleted documents.
It is not until segments are merged that the bytes consumed by deleted documents are reclaimed. Likewise, any terms that occur only in deleted documents (ghost terms) are not removed until merge. This approach is necessary because it would otherwise be far too costly to update Lucene's write-once index data structures and aggregate statistics for every document deletion, but it has some implications:
Deleted documents tie up disk space in the index.
In-memory per-document data structures, such as norms or field data, will still consume RAM for deleted documents.
Search throughput is lower, since each search must check the deleted bitset for every potential hit. More on this below.
Aggregate term statistics, used for query scoring, will still reflect deleted terms and documents. When a merge completes, the term statistics will suddenly jump closer to their true values, changing hit scores. In practice this impact is minor, unless the deleted documents had divergent statistics from the rest of the index.
A deleted document ties up a document ID from the maximum 2.1 B documents for a single shard. If your shard is riding close to that limit (not recommended!) this could matter.
Fuzzy queries can have slightly different results, because they may match ghost terms.
Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.
var node = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(node);
var client = new ElasticClient(settings);
client.DeleteByQuery<ElasticsearchProject>(del => del
.Query(q => q.QueryString(qs=>qs.Query("*"))
Open in new window
The optimize API allows to optimize one or more indices through an API. The optimize process basically optimizes the index for faster search operations (and relates to the number of segments a Lucene index holds within each shard). The optimize operation allows to reduce the number of segments by merging them.
This call will block until the optimize is complete. If the http connection is lost, the request will continue in the background, and any new requests will block until the previous optimize is complete
When a document is deleted or updated (= delete + add), Apache Lucene simply marks a bit in a per-segment bitset to record that the document is deleted.
From novice to tech pro — start learning today.
Premium members can enroll in this course at no extra cost.
Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.
Have a better answer? Share it in a comment.
Please enter a first name
Please enter a last name
Must be at least 4 characters long.
Join and Comment