DEVELOPER ZONE ::
Login / Register
JOIN
SyntaxUNION
SyntaxSELECT [ALL | DISTINCT | DISTINCTROW ] [HIGH_PRIORITY] [STRAIGHT_JOIN] [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT] [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]select_expr
, ... [FROMtable_references
[WHEREwhere_condition
] [GROUP BY {col_name
|expr
|position
} [ASC | DESC], ... [WITH ROLLUP]] [HAVINGwhere_condition
] [ORDER BY {col_name
|expr
|position
} [ASC | DESC], ...] [LIMIT {[offset
,]row_count
|row_count
OFFSEToffset
}] [PROCEDUREprocedure_name
(argument_list
)] [INTO OUTFILE 'file_name
'export_options
| INTO DUMPFILE 'file_name
' | INTOvar_name
[,var_name
]] [FOR UPDATE | LOCK IN SHARE MODE]]
SELECT
is used to retrieve rows selected from one or more tables, and can include UNION
statements and subqueries. See Section 13.2.7.3, “UNION
Syntax”, and Section 13.2.8, “Subquery Syntax”.
The most commonly used clauses of SELECT
statements are these:
Each select_expr
indicates a column that you want to retrieve. There must be at least one select_expr
.
table_references
indicates the table or tables from which to retrieve rows. Its syntax is described in Section 13.2.7.1, “JOIN
Syntax”.
The WHERE
clause, if given, indicates the condition or conditions that rows must satisfy to be selected. where_condition
is an expression that evaluates to true for each row to be selected. The statement selects all rows if there is no WHERE
clause.
In the WHERE
clause, you can use any of the functions and operators that MySQL supports, except for aggregate (summary) functions. See Chapter 12, Functions and Operators.
SELECT
can also be used to retrieve rows computed without reference to any table.
For example:
mysql> SELECT 1 + 1;
-> 2
You are allowed to specify DUAL
as a dummy table name in situations where no tables are referenced:
mysql> SELECT 1 + 1 FROM DUAL;
-> 2
DUAL
is purely for the convenience of people who require that all SELECT
statements should have FROM
and possibly other clauses. MySQL may ignore the clauses. MySQL does not require FROM DUAL
if no tables are referenced.
In general, clauses used must be given in exactly the order shown in the syntax description. For example, a HAVING
clause must come after any GROUP BY
clause and before any ORDER BY
clause. The exception is that the INTO
clause can appear either as shown in the syntax description or immediately preceding the FROM
clause.
A select_expr
can be given an alias using AS
. The alias is used as the expression's column name and can be used in alias_name
GROUP BY
, ORDER BY
, or HAVING
clauses. For example:
SELECT CONCAT(last_name,', ',first_name) AS full_name FROM mytable ORDER BY full_name;
The AS
keyword is optional when aliasing a select_expr
. The preceding example could have been written like this:
SELECT CONCAT(last_name,', ',first_name) full_name FROM mytable ORDER BY full_name;
However, because the AS
is optional, a subtle problem can occur if you forget the comma between two select_expr
expressions: MySQL interprets the second as an alias name. For example, in the following statement, columnb
is treated as an alias name:
SELECT columna columnb FROM mytable;
For this reason, it is good practice to be in the habit of using AS
explicitly when specifying column aliases.
It is not allowable to use a column alias in a WHERE
clause, because the column value might not yet be determined when the WHERE
clause is executed. See Section B.1.5.4, “Problems with Column Aliases”.
The FROM
clause indicates the table or tables from which to retrieve rows. If you name more than one table, you are performing a join. For information on join syntax, see Section 13.2.7.1, “table_references
JOIN
Syntax”. For each table specified, you can optionally specify an alias.
tbl_name
[[AS]alias
] [index_hint
)]
The use of index hints provides the optimizer with information about how to choose indexes during query processing. For a description of the syntax for specifying these hints, see Section 13.2.7.2, “Index Hint Syntax”.
You can use SET max_seeks_for_key=
as an alternative way to force MySQL to prefer key scans instead of table scans. See Section 5.2.3, “System Variables”.value
You can refer to a table within the default database as tbl_name
, or as db_name
.tbl_name
to specify a database explicitly. You can refer to a column as col_name
, tbl_name
.col_name
, or db_name
.tbl_name
.col_name
. You need not specify a tbl_name
or db_name
.tbl_name
prefix for a column reference unless the reference would be ambiguous. See Section 9.2.1, “Identifier Qualifiers”, for examples of ambiguity that require the more explicit column reference forms.
A table reference can be aliased using
or tbl_name
AS alias_name
tbl_name alias_name
:
SELECT t1.name, t2.salary FROM employee AS t1, info AS t2 WHERE t1.name = t2.name; SELECT t1.name, t2.salary FROM employee t1, info t2 WHERE t1.name = t2.name;
Columns selected for output can be referred to in ORDER BY
and GROUP BY
clauses using column names, column aliases, or column positions. Column positions are integers and begin with 1:
SELECT college, region, seed FROM tournament ORDER BY region, seed; SELECT college, region AS r, seed AS s FROM tournament ORDER BY r, s; SELECT college, region, seed FROM tournament ORDER BY 2, 3;
To sort in reverse order, add the DESC
(descending) keyword to the name of the column in the ORDER BY
clause that you are sorting by. The default is ascending order; this can be specified explicitly using the ASC
keyword.
Use of column positions is deprecated because the syntax has been removed from the SQL standard.
If you use GROUP BY
, output rows are sorted according to the GROUP BY
columns as if you had an ORDER BY
for the same columns. To avoid the overhead of sorting that GROUP BY
produces, add ORDER BY NULL
:
SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
MySQL extends the GROUP BY
clause so that you can also specify ASC
and DESC
after columns named in the clause:
SELECT a, COUNT(b) FROM test_table GROUP BY a DESC;
MySQL extends the use of GROUP BY
to allow selecting fields that are not mentioned in the GROUP BY
clause. If you are not getting the results that you expect from your query, please read the description of GROUP BY
found in Section 12.11, “Functions and Modifiers for Use with GROUP BY
Clauses”.
GROUP BY
allows a WITH ROLLUP
modifier. See Section 12.11.2, “GROUP BY
Modifiers”.
The HAVING
clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT
is applied after HAVING
.)
A HAVING
clause can refer to any column or alias named in a select_expr
in the SELECT
list or in outer subqueries, and to aggregate functions. However, the SQL standard requires that HAVING
must reference only columns in the GROUP BY
clause or columns used in aggregate functions. To accommodate both standard SQL and the MySQL-specific behavior of being able to refer columns in the SELECT
list, MySQL 5.0.2 and up allows HAVING
to refer to columns in the SELECT
list, columns in the GROUP BY
clause, columns in outer subqueries, and to aggregate functions.
For example, the following statement works in MySQL 5.0.2 but produces an error for earlier versions:
mysql> SELECT COUNT(*) FROM t GROUP BY col1 HAVING col1 = 2;
If the HAVING
clause refers to a column that is ambiguous, a warning occurs. In the following statement, col2
is ambiguous because it is used as both an alias and a column name:
SELECT COUNT(col1) AS col2 FROM t GROUP BY col2 HAVING col2 = 2;
Preference is given to standard SQL behavior, so if a HAVING
column name is used both in GROUP BY
and as an aliased column in the output column list, preference is given to the column in the GROUP BY
column.
Do not use HAVING
for items that should be in the WHERE
clause. For example, do not write the following:
SELECTcol_name
FROMtbl_name
HAVINGcol_name
> 0;
Write this instead:
SELECTcol_name
FROMtbl_name
WHEREcol_name
> 0;
The HAVING
clause can refer to aggregate functions, which the WHERE
clause cannot:
SELECT user, MAX(salary) FROM users GROUP BY user HAVING MAX(salary) > 10;
(This did not work in some older versions of MySQL.)
MySQL allows duplicate column names. That is, there can be more than one select_expr
with the same name. This is an extension to standard SQL. Because MySQL also allows GROUP BY
and HAVING
to refer to select_expr
values, this can result in an ambiguity:
SELECT 12 AS a, a FROM t GROUP BY a;
In that statement, both columns have the name a
. To ensure that the correct column is used for grouping, use different names for each select_expr
.
MySQL resolves unqualified column or alias references in ORDER BY
clauses by searching in the select_expr
values, then in the columns of the tables in the FROM
clause. For GROUP BY
or HAVING
clauses, it searches the FROM
clause before searching in the select_expr
values. (For GROUP BY
and HAVING
, this differs from the pre-MySQL 5.0 behavior that used the same rules as for ORDER BY
.)
The LIMIT
clause can be used to constrain the number of rows returned by the SELECT
statement. LIMIT
takes one or two numeric arguments, which must both be non-negative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT
is equivalent to row_count
LIMIT 0,
.row_count
For prepared statements, you can use placeholders (supported as of MySQL version 5.0.7). The following statements will return one row from the tbl
table:
SET @a=1; PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?'; EXECUTE STMT USING @a;
The following statements will return the second to sixth row from the tbl
table:
SET @skip=1; SET @numrows=5; PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?, ?'; EXECUTE STMT USING @skip, @numrows;
For compatibility with PostgreSQL, MySQL also supports the LIMIT
syntax.row_count
OFFSET offset
The SELECT ... INTO OUTFILE '
form of file_name
'SELECT
writes the selected rows to a file. The file is created on the server host, so you must have the FILE
privilege to use this syntax. file_name
cannot be an existing file, which among other things prevents files such as /etc/passwd
and database tables from being destroyed. As of MySQL 5.0.19, the character_set_filesystem
system variable controls the interpretation of the filename.
The SELECT ... INTO OUTFILE
statement is intended primarily to let you very quickly dump a table to a text file on the server machine. If you want to create the resulting file on some client host other than the server host, you cannot use SELECT ... INTO OUTFILE
. In that case, you should instead use a command such as mysql -e "SELECT ..." >
to generate the file on the client host.file_name
SELECT ... INTO OUTFILE
is the complement of LOAD DATA INFILE
; the syntax for the export_options
part of the statement consists of the same FIELDS
and LINES
clauses that are used with the LOAD DATA INFILE
statement. See Section 13.2.5, “LOAD DATA INFILE
Syntax”.
FIELDS ESCAPED BY
controls how to write special characters. If the FIELDS ESCAPED BY
character is not empty, it is used as a prefix that precedes following characters on output:
The FIELDS ESCAPED BY
character
The FIELDS [OPTIONALLY] ENCLOSED BY
character
The first character of the FIELDS TERMINATED BY
and LINES TERMINATED BY
values
ASCII NUL
(the zero-valued byte; what is actually written following the escape character is ASCII ‘0
’, not a zero-valued byte)
The FIELDS TERMINATED BY
, ENCLOSED BY
, ESCAPED BY
, or LINES TERMINATED BY
characters must be escaped so that you can read the file back in reliably. ASCII NUL
is escaped to make it easier to view with some pagers.
The resulting file does not have to conform to SQL syntax, so nothing else need be escaped.
If the FIELDS ESCAPED BY
character is empty, no characters are escaped and NULL
is output as NULL
, not \N
. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.
Here is an example that produces a file in the comma-separated values (CSV) format used by many programs:
SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n' FROM test_table;
If you use INTO DUMPFILE
instead of INTO OUTFILE
, MySQL writes only one row into the file, without any column or line termination and without performing any escape processing. This is useful if you want to store a BLOB
value in a file.
The INTO
clause can name a list of one or more variables, which can be user-defined variables, or parameters or local variables within a stored function or procedure body. The selected values are assigned to the variables. The number of variables must match the number of columns.
Within a stored routine, the variables can be routine parameters or local variables. See Section 17.2.7.3, “SELECT ... INTO
Statement”.
Note: Any file created by INTO OUTFILE
or INTO DUMPFILE
is writable by all users on the server host. The reason for this is that the MySQL server cannot create a file that is owned by anyone other than the user under whose account it is running. (You should never run mysqld as root
for this and other reasons.) The file thus must be world-writable so that you can manipulate its contents.
The SELECT
syntax description at the beginning this section shows the INTO
clause near the end of the statement. It is also possible to use INTO OUTFILE
or INTO DUMPFILE
immediately preceding the FROM
clause.
A PROCEDURE
clause names a procedure that should process the data in the result set. For an example, see Section 24.3.1, “Procedure Analyse”.
If you use FOR UPDATE
with a storage engine that uses page or row locks, rows examined by the query are write-locked until the end of the current transaction. Using LOCK IN SHARE MODE
sets a shared lock that allows other transactions to read the examined rows but not to update or delete them. See Section 14.2.10.5, “SELECT ... FOR UPDATE
and SELECT ... LOCK IN SHARE MODE
Locking Reads”.
Following the SELECT
keyword, you can use a number of options that affect the operation of the statement.
The ALL
, DISTINCT
, and DISTINCTROW
options specify whether duplicate rows should be returned. If none of these options are given, the default is ALL
(all matching rows are returned). DISTINCT
and DISTINCTROW
are synonyms and specify removal of duplicate rows from the result set.
HIGH_PRIORITY
, STRAIGHT_JOIN
, and options beginning with SQL_
are MySQL extensions to standard SQL.
HIGH_PRIORITY
gives the SELECT
higher priority than a statement that updates a table. You should use this only for queries that are very fast and must be done at once. A SELECT HIGH_PRIORITY
query that is issued while the table is locked for reading runs even if there is an update statement waiting for the table to be free.
HIGH_PRIORITY
cannot be used with SELECT
statements that are part of a UNION
.
STRAIGHT_JOIN
forces the optimizer to join the tables in the order in which they are listed in the FROM
clause. You can use this to speed up a query if the optimizer joins the tables in non-optimal order. See Section 7.2.1, “Optimizing Queries with EXPLAIN
”. STRAIGHT_JOIN
also can be used in the table_references
list. See Section 13.2.7.1, “JOIN
Syntax”.
SQL_BIG_RESULT
can be used with GROUP BY
or DISTINCT
to tell the optimizer that the result set has many rows. In this case, MySQL directly uses disk-based temporary tables if needed, and prefers sorting to using a temporary table with a key on the GROUP BY
elements.
SQL_BUFFER_RESULT
forces the result to be put into a temporary table. This helps MySQL free the table locks early and helps in cases where it takes a long time to send the result set to the client.
SQL_SMALL_RESULT
can be used with GROUP BY
or DISTINCT
to tell the optimizer that the result set is small. In this case, MySQL uses fast temporary tables to store the resulting table instead of using sorting. This should not normally be needed.
SQL_CALC_FOUND_ROWS
tells MySQL to calculate how many rows there would be in the result set, disregarding any LIMIT
clause. The number of rows can then be retrieved with SELECT FOUND_ROWS()
. See Section 12.10.3, “Information Functions”.
SQL_CACHE
tells MySQL to store the query result in the query cache if you are using a query_cache_type
value of 2
or DEMAND
. For a query that uses UNION
or subqueries, this option effects any SELECT
in the query. See Section 5.13, “The MySQL Query Cache”.
SQL_NO_CACHE
tells MySQL not to store the query result in the query cache. See Section 5.13, “The MySQL Query Cache”. For a query that uses UNION
or subqueries, this option effects any SELECT
in the query.
User Comments
You can simulate a CROSSTAB by the following method:-
Use IF function to select the key value of the sub table as in:
SELECT
SUM(IF(beta_idx=1, beta_value,0)) as beta1_value,
SUM(IF(beta_idx=2, beta_value,0)) as beta2_value,
SUM(IF(beta_idx=3, beta_value,0)) as beta3_value
FROM alpha JOIN beta WHERE alpha_id = beta_alpha_id;
where alpha table has the form alpha_id, alpha_blah, alpha_blah_blah
and beta table has the form beta_alpha_id, beta_other stuff,
beta_idx, beta_value
This will create 3 columns with totals of beta values according to their idx field
when selecting a single random row you have to use a query like this: SELECT ... FROM my_table ORDER BY RAND() LIMIT 1.
as explain shows, mysql optimizes this VERY badly (or may be better said, doens't optimize it at all): it uses an temporary table and an extra filesort.
couldn't this be optimized?!
if not, may be add a syntax like SELECT RANDOM_ROW .... FROM my_table ...
This method of selecting a random row should be fast:
LOCK TABLES foo READ;
SELECT FLOOR(RAND() * COUNT(*)) AS rand_row FROM foo;
SELECT * FROM foo LIMIT $rand_row, 1;
UNLOCK TABLES;
Unfortunately, variables cannot be used in the LIMIT clause, otherwise the entire thing could be done completely in SQL.
In reply to David Philips:
If your tables are not all that big, a simpler method is:
SELECT * FROM foo ORDER BY RAND(NOW()) LIMIT 1;
If it's a big table, your method will almost certainly be faster.
If you want to find duplicates on a field that hasn't been uniquely indexed, you can do this:
SELECT BookISBN, count(BookISBN) FROM Books GROUP BY BookISBN HAVING COUNT(BookISBN)>1;
Sometimes you want to retrieve the records that DONT match a select statement.
Consider this select:
SELECT CarIndex FROM DealerCatalog, BigCatalog WHERE
DealerCatalog.CarIndex=BigCatalog.CarIndex
This finds all the CarIndex values in the Dealer's catalog that are in the bigger distributor catalog.
How do I then find the dealer CarIndex values that ARE NOT in the bigger catalog?
The answer is to use LEFT JOIN - anything that doesn't join is given a NULL value , so we look for that:
SELECT CarIndex FROM DealerCatalog LEFT JOIN BigCatalog ON DealerCatalog.CarIndex=BigCatalog.CarIndex WHERE BigCatalog.CarIndex IS NULL
To find double entries in a table:
SELECT db1.*
FROM tbl_data db1, tbl_data k2
WHERE db1.id db2.id
AND db1.name = db2.name
db1.id must be the PK
db1.name must be the fields that should be verified as double entries.
(I'm not sure wether the code is correct but in my case it works)
Johann
In order to anti-match fields by wildcards, one has to check whether the value of the field is not NULL:
For example: The table 'runs' contains 34876 rows. 205 rows have an 'info' field containing the string 'wrong'.
To select those rows for which the 'info' column does *NOT* contain the word 'wrong' one has to do:
mysql> select count(*) FROM runs WHERE info is null or info not like '%wrong%';
but not:
mysql> select count(*) FROM runs WHERE info not like %wrong%';
which would lead to a much smaller number of selected rows.
I have managed to select random records using php and MySQL like the following:
$min=1;
$row=mysql_fetch_assoc(mysql_query("SHOW TABLE STATUS LIKE 'table';"));
$max=$row["Auto_increment"];
$random_id=rand($min,$max);
$row=mysql_fetch_assoc(mysql_query("SELECT * FROM table WHERE id='$random_id'");
Voila...
Cezar
http://RO-Escorts.com
Random records without PHP, only MySQL:
select * from mailinglists order by rand() limit 1
Regards,
Geert van der Ploeg
All examples about selecting random row like:
SELECT * FROM foo ORDER BY RAND() LIMIT 1;
SELECT * FROM foo ORDER BY RAND(NOW()) LIMIT 1;
are not usefull when it's about selecting a range of rows from a table.
For example, if the table has 1000 records, and we want to select one record randomly but from only 2 records from table (based on a WHERE clause), the above examples fail because the same row is returned all the time.
This is my version in this special case...
mopen();
$result=mysql_query("SELECT * FROM foo WHERE fieldx='value'");
$min=1; $max=mysql_num_rows($result);
if($max>0)
{
mysql_data_seek($result,rand($min,$max)-1);
$row=mysql_fetch_assoc($result);
echo_or_use_random_record_which_is_ $row ...
}
mclose();
?>
This code is selecting the needed rows, and picks one of the rows randomly.
This is very important when you need to select a huge number of rows liek we did on http://fantezii.com
Cezar
Sometimes it is nice to use the SELECT query options like SQL_CALC_FOUND_ROWS or SQL_CACHE, but to maintain compatibility across different databases or even older versions of MySQL which do not support those options, it is possible to enclose them in a comment block, e.g.:
SELECT /*! 40000 SQL_CALC_FOUND_ROWS */ foo,bar FROM some_table;
The /* construct will stop DBMS's other than MySQL from parsing the comment contents, while /*! will tell ALL MySQL versions to parse the "comment" (which is actually a non-comment to MySQL). The /*!40000 construct will tell MySQL servers starting from 4.0.0 (which is the first version to support SQL_CALC_FOUND_ROWS) to parse the comment, while earlier versions will ignore it.
I am using this way to select random row or rows:
SELECT * [or any needed fileds], idx*0+RAND() as rnd_id FROM tablename ORDER BY rnd_id LIMIT 1 [or the number of rows]
Meanwhile, I didn't stumble in any problems with this usage.
I picked this method in some forum, don't remember when, where or by who was it introduced :)
I found through some testing that ordering by dates causes a significant difference in execution time. If you have a primary key which is inline with your Date field(i.e. messages are inserted in date order), then order by ID instead, I found a massive 66% improvement across many queries.
My method of retrieving duplicate entries
In a database with personal information (name, surname, etc..) with an auto_increment index I wanted to retrieve all the entries with same name and surname field (duplicate names), which by accident were inserted to the base.
I used this syntax
SELECT name,surname,COUNT(name) AS cnt_n, COUNT(surname) AS cnt_s FROM the_table GROUP BY name HAVING cnt_n>1 AND cnt_s>1;
I hope this might be of help to anyone that wants to do some extended maintenance on the database
On the other hand, for this case it's simplier to engage an appropriate index if there is such:
CREATE INDEX ccr_news_insert_date_i ON ccr_news (insert_date DESC);
SELECT *
FROM ccr_news
WHERE insert_date > 0;
or, if for some reason MySQL still uses a full table scan:
SELECT *
FROM ccr_news FORCE INDEX (ccr_news_insert_date_i)
WHERE insert_date > 0;
If you want to ORDER BY [columnname] ASC
and have the NULL rows in the bottom
you can use ORDER BY -[columnname] DESC
Select Name,Category FROM authors ORDER BY Category,Name;
Will allow you to sort by categories listed in a seperate table
IF the category column in this primary table contains ID values
from your ID column in your second reference table.
So your first "authors" table looks like:
id name category
1 Henry Miller 2
3 June Day 1
3 Thomas Wolf 2
and your second reference table looks like:
id category
1 Modern
2 Classics
Now when the order of categories is changed in the second table
the order of categories will be reflected in the primary table.
Then just select the categories from the reference table and put
the list into a numbered array. Then in your script when you run
across a category number from the first recordset just reference
the value from the index in the second array to obtain the value.
In php in the above example it might look like:
foreach ($recordset as $key => $record) {
echo $record["id"] . ":" . $record["name"] . ":" . $ordered_cats[$record["category"]];
}
This may seem obvious to some but I was pulling my hair out
trying to figure out how to order a recordset based on a list
from a different table. Hope this helps someone.
Ed
The LIMIT clause can be used when you would use TOP in Access or MS SQL.
(LINUX) By default, if you don't specify absolute path for OUTFILE in
"
select ... into OUTFILE "..."
It creates the file in "/var/lib/mysql/
Make sure current user has (NOT) a write permission in that directory.
MySQL 4 and later string comparisons, including DISTINCT, aren't case sensitive unless the field is declared as BINARY or you use BINARY in your comparison.
If You want to find the rows which are having a column with identical values then,
SELECT managerId, count(company) FROM manager GROUP BY company HAVING COUNT(company)>=8 (say)
Regards,
Kumar.S
The keyword WHERE is NOT part of the "where_definition"!
The following definition of the "where_definition":
"where_definition consists of the keyword WHERE followed by
an expression that indicates the condition or conditions that
rows must satisfy to be selected"
is in contradiction with the SELECT given syntax:
SELECT
...
[FROM table_references
[WHERE where_definition]
I found a nifty way of influencing the ORDER of rows returned by a query that helps in displaying a list with frequently accessed items at the top.
An example is a name/address form where the country is a selectable list. If most of your users are from the UK and US you may want to do something like:
SELECT * FROM countries ORDER by iso_code IN ('UK', 'US') desc
Which returns something like:
Hope this helps someone! [email protected]
It seems there is no way to select * from table where a certain field is distinct. In 2 sqls it is easy:
$sql9 = "SELECT DISTINCT field AS distinctfield FROM table ORDER BY distinctfield ";
$res9= $db->execute($sql9);
for($ll=0;$llgetNumTuples();$ll++)
{
$row = $res9->getTupleDirect($ll);
$distinctfield = $row[distinctfield];
$sql8="select * from table WHERE field='distinctfield' ORDER BY distinctfield LIMIT 1";
}
But not one!
Fahed
reply to Fahed Bizzari's post, based on Havilland-Fortesque-Smedley's comment (above) the equivalent of select * while doing DISTINCT is:
select *, count(FIELD) from TABLE group by FIELD having count(FIELD)=1 into outfile 'foobar.txt';
then you can check the output. note that there are twice as many rows as records, because each unique row is followed by its count (in this case count=1). so just toss the .txt file into something and sort on the field containing the count and throw out all the rows =1. this is the same result as a select * distinct FIELD (as far as I can tell).
anyway, works for me. aloha. Lex
oh, about the previous post, it's not correct because distinct should be
count(FIELD)=>1
which still doesn't solve the DISTINCT part
Lex
In regards to:
______________________________________________________________________________________________
******************************************
I found a nifty way of influencing the ORDER of rows returned by a query that helps in displaying a list with frequently accessed items at the top.
An example is a name/address form where the country is a selectable list. If most of your users are from the UK and US you may want to do something like:
SELECT * FROM countries ORDER by iso_code IN ('UK', 'US') desc
Which returns something like:
******************************************
If found that if you also add in another 'iso_code' column in the order by statment after the first one containing the IN() statment, it will sort the remaining records:
SELECT * FROM countries ORDER by iso_code IN ('UK', 'US') desc, iso_code
When using the SELECT ... INTO OUTFILE syntax, use a UNION to add headers. Here's an example for CSV output:
SELECT 'Fiscal Year','Location','Sales'
UNION
SELECT FY, Loc, Sales INTO OUTFILE 'salesreport.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
FROM SalesTable;
This will add the text headers Fiscal Year, Location and Sales to your fields. Only caveat is with an ORDER BY statement, if you don't want your headers sorted along with your data you need to enclose it in parenthesis:
SELECT 'Fiscal Year','Location','Sales'
UNION
{SELECT FY, Loc, Sales INTO OUTFILE 'salesreport.csv'
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
FROM SalesTable
ORDER BY Sales DESC);
To correct Lex one more time, it should be count(FIELD)>=1.
So the whole query for retrieving a whole row with one field distinct is:
select *, count(FIELD) from TABLE group by FIELD having count(FIELD)>=1;
Thanks, Lex. You are a lifesaver.
As a newbie to MySQL and to dealing with BLOBs, I had a difficult time trying to determine how to extract a BLOB field from the database back to a file. It turns out to be quite simple by doing the following SQL:
select blobfield into dumpfile '/tmp/blobfile' from blobtable;
In response to Heywood's tip about adding column headers to OUTFILEs...
Make sure that the format of the columns that match up with your headers doesn't limit the display of the headers. For instance, I was using the UNION tip to add a header to a column defined as char(2) (for storing a two-letter state code). The resulting CSV file only displayed the first two letters of my column header. The fix is simple, just use CAST() on the column in the second SELECT to convert it to the appropriate type. In my case, doing something like this:
SELECT 'state header' FROM table UNION SELECT CAST(state AS char) FROM table INTO OUTFILE [...]
worked just dandy. Hope that saves someone a little time.
Arbitrary Ordering
I came across this example at
http://www.shawnolson.net/a/722/
Neat way of using the CASE statement.
Example for ordering price information
price is orderd ascending but the 0.00
prices end up underneath
SELECT dienst.dienst, dienst.url, dienst.info, dienst_prijs.dienst_eenheid, dienst_prijs.prijs, dienst_prijs.inc_btw, dienst_prijs.dienst_optie,
CASE dienst_prijs.prijs
WHEN dienst_prijs.prijs = '0.00' THEN 1000
WHEN dienst_prijs.prijs > '0.00' THEN 10
ELSE NULL
END AS orderme
FROM dienst, dienst_prijs
WHERE dienst.taal = 'nl' &&
dienst.dienst_type = 'Internet toegang' &&
dienst.dienst != 'alle diensten' &&
dienst.publiceer != '' &&
dienst_prijs.dienst_eenheid IN ( 'maand', 'jaar' ) &&
dienst.dienst = dienst_prijs.dienst
ORDER BY orderme, dienst_prijs.prijs
If you want to use ORDER BY before GROUP BY, the only way I've found to achieve it is with a subquery.
For example, if you want to get a list of users from a table UserActions sorted according to the most recent action (based on a field called Time) the query would be:
SELECT * FROM (SELECT * FROM UserActions ORDER BY Time DESC) AS Actions GROUP BY UserID ORDER BY Time DESC;
Without the subquery, the group is performed first, and so the first record that appears in the database (which is not necessarily in the order you want) will be used to determine the sort order. This caused me huge problems as my data was in a jumbled order within the table.
--Edit--
This same result can be achieved with the use of MAX(Time), so the query would be:
SELECT *, MAX(Time) AS LatestAction GROUP BY UserID ORDER BY LatestAction DESC;
As far as I can see, the subquery model still holds up if you need more complex sorting before performing the GROUP.
I've seen it asked elsewhere about how to select all duplicates, not just one row for each dupe.
CREATE TEMPORARY TABLE dupes SELECT * FROM tablename GROUP BY colname HAVING COUNT(*)>1 ORDER BY colname;
SELECT t.* FROM tablename t, dupes d WHERE t.colname = d.colname ORDER BY t.colname;
Be careful about the "SELECT...INTO OUTFILE" options. They are similar to, but not exactly the same as, the mysqldump options.
Two things:
1) The options in mysqldump can be in any order, because they are true command-line options (that is, they are conceptually used together, but syntactically separate on the mysqldump command line). The options in the SELECT...INTO OUTFILE need to be in the exact order as specified in the documentation above.
2) The options MUST have dashes between the words (e.g., fields-enclosed-by) when use as options with the mysqldump utility, but MUST NOT have dashes when used as options with the SELECT...INTO OUTFILE. This may not be clear in the documentation above.
Wayne
In reply to Fahed Bizzari et al...
If you want to select all fields from distinct rows why not use:
SELECT DISTINCT * FROM table GROUP BY field;
Don't forget the DISTINCT relates to the ORDER BY / GROUP BY and has nothing to do with the 'select_expr'
If you want the count as well then use:
SELECT DISTINCT *, count(*) AS count FROM table GROUP BY field;
Where is the pagination code as offered by the google search machine????
Kumar/Germany
If you have a binary string type field and you want a case insensitive sorting you can use CAST() as follow:
case sensitive example (DECODE return a binary string):
----------------------------------------------------------------------------
SELECT DECODE(EncodedField) AS DecodedField
FROM TableWithEncodedField
ORDER BY DecodedField;
case insensitive solution:
---------------------------------
SELECT CAST(DECODE(EncodedField) AS CHAR) AS DecodedField
FROM TableWithEncodedField
ORDER BY DecodedField;
I hope it may be usefull.
To select specific rows from the table use the IN statement.
Example:
SELECT * FROM table WHERE myid IN (2, 16, 93,102);
This would return multiple rows based on specific criteria.
For large tables with auto incremented primary key values, I have found the following to be most efficient in obtaining one random row:
SELECT * FROM my_table
WHERE pk_column >=
(SELECT FLOOR( MAX(pk_column) * RAND()) FROM my_table)
ORDER BY pk_column
LIMIT 1;
Reply to Edward Hermanson post (above):
I prefer this way of sorting table by column values listed in another table:
The accnumber column in primary table contains ID values from ID column in the secondary table.
Primary table "contacts":
id name accnumber
1 Cooke 3
2 Peterson 3
3 Stevens 1
Secondary table "accounts":
id accname
1 Company1
2 Company2
3 Company3
SELECT contacts.lname, accounts.accname
FROM contacts, accounts
WHERE contacts.accnumber = accounts.id ORDER BY accname;
ORDER BY textfield in natural order!?
Lets say you want the following result:
File1
File2
File10
I havn't found a way to do it in SQL, here is a way to do it in PHP (just replace 'order_by' to the field you want to order by):
$result = mysql_query("SELECT order_by,... FROM table");
$rows = array();
if($result)
{
while(($row = mysql_fetch_array($result, MYSQL_ASSOC)))
$rows[] = $row;
usort($rows, create_function('$a, $b', 'return strnatcasecmp($a["order_by"], $b["order_by"]);'));
}
SELECT ... INTO OUTFILE requires the id to have the FILE privilege. That is,
GRANT SELECT, FILE ON * . * TO "[whomever]"@ "localhost";
As noted above, the output directory must be writable by the id under which the mysqld process is running. Use "grep user= /etc/my.cnf " to find it.
Selecting a random row in SQL? Try:
set @a = (SELECT COUNT(*)-1 c FROM palette)*RAND() DIV 1;
PREPARE STMT FROM 'SELECT * FROM palette LIMIT ?,1';
EXECUTE STMT USING @a;
If you want to keep field names, consider using mysqldump instead of SELECT INTO OUTFILE.
I use this method to transfer small amounts of data from our live database to our test database, for example when investigating a reported problem in our program code. (We cannot guarantee the field order across all our databases.)
rem Edit order number before running
rem Give password when prompted
rem Result files will be in current working directory
\mysql\bin\mysqldump livedb -uliveuser -p --no-create-info --tables orderpayment --where=orderid=2712>resultp.txt
\mysql\bin\mysqldump livedb -uliveuser -p --no-create-info --tables orderitem --where=orderid=2712>resulti.txt
\mysql\bin\mysqldump livedb -uliveuser -p --no-create-info --tables orderheader --where=id=2712>resulth.txt
when you meet more conditions, you may use the following code:
select * from yourdatabase where fieldone='value1' and fieldtwo='value2';
Fahed Bizzari, that is not 2 queries, that is $res9->getNumTuples() + 1 queries!
To select the identifiers with the greatest value in each class (where each identifier falls into one class):
SELECT id_class,id FROM tbl,(SELECT MAX(val) AS val FROM tbl GROUP BY id_class) AS _tbl WHERE tbl.val = _tbl.val;
We had a table logging state changes for a series of objects and wanted to find the most recent state for each object. The "val" in our case was an auto-increment field.
This seems to be the simplest solution that runs in a reasonable amount of time.
In a student signup list, use sql to find classes which are
not full. Involves combined use of RIGHT JOIN, COUNT, WHERE, GROUP BY, HAVING, and ORDER BY.
CREATE TABLE `classdescription` (
`ClassID` mediumint(9) NOT NULL auto_increment,
`ClassType` varchar(10) NOT NULL default '',
`ClassName` varchar(50) NOT NULL default '',
`ClassDate` datetime NOT NULL default '0000-00-00 00:00:00',
`ClassMax` mediumint(9) default NULL,
PRIMARY KEY (`ClassID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=1 ;
CREATE TABLE `class_signups` (
`s_PersonID` mediumint(9) NOT NULL default '0',
`s_ClassID` mediumint(9) NOT NULL default '0',
`s_Status` varchar(5) default NULL,
KEY `s_ClassID` (`s_ClassID`),
KEY `s_PersonID` (`s_PersonID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO `classdescription` VALUES (2, 'firstaid', '', '2005-01-02 11:00:00', 2);
INSERT INTO `classdescription` VALUES (3, 'advanced-med', '', '2005-01-02 13:00:00', 1);
INSERT INTO `class_signups` VALUES (11, 2, '');
INSERT INTO `class_signups` VALUES (12, 2, '');
Now use RIGHT JOIN to list all class descriptions along with signups if any,
SELECT cs.s_ClassID, cs.s_PersonID, cd.ClassName, cd.ClassID, cd.ClassType, cd.ClassDate, cd.ClassMax
from class_signups cs RIGHT JOIN
classdescription cd on (cs.s_ClassID = cd.ClassID )
in itself, not too useful, but you can see classes
having no one signed up as a NULL.
To count the number of signups for each class:
SELECT cs.s_ClassID, COUNT(s_ClassID) AS ClassTotal, cd.ClassName, cd.ClassID, cd.ClassType, cd.ClassDate, cd.ClassMax
from class_signups cs RIGHT JOIN
classdescription cd on (cs.s_ClassID = cd.ClassID )
GROUP BY cd.ClassID
The COUNT/GROUP BY options show a row per unique ClassID, and the COUNT is adding up
non-null occurances of field s_ClassID. If we had used COUNT(*) then the class with
no signups would have counted 1 record, rather than the desired 0/NULL for no
signups.
Now we show only classes where the count of signups is less than ClassMax, meaning the
class has openings!
SELECT cs.s_ClassID, COUNT(s_ClassID) AS ClassTotal, cd.ClassName, cd.ClassID, cd.ClassType, cd.ClassDate, cd.ClassMax
from class_signups cs RIGHT JOIN
classdescription cd on (cs.s_ClassID = cd.ClassID )
GROUP BY cd.ClassID
HAVING ClassTotal The HAVING clause limits the after-JOIN output rows to ones matching its criteria, discarding others!
We may want to look only at the firstaid ClassType, so add a WHERE clause to
the JOIN,
SELECT cs.s_ClassID, COUNT(s_ClassID) AS ClassTotal, cd.ClassName, cd.ClassID, cd.ClassType, cd.ClassDate, cd.ClassMax
from class_signups cs RIGHT JOIN
classdescription cd on (cs.s_ClassID = cd.ClassID ) WHERE cd.ClassType='firstaid'
GROUP BY cd.ClassID
HAVING ClassTotal Now there are no outputs as firstaid is full, but
suppose we are looking in this list with respect
to a certain student PersonID==12. That is, we want to see classes this person can signup
for, including the ones they are already in!
In the case we need to disregard signups by PersonID==12 for e.g.,
SELECT cs.s_ClassID, COUNT(s_ClassID) AS ClassTotal, cd.ClassName, cd.ClassID, cd.ClassType, cd.ClassDate, cd.ClassMax
from class_signups cs RIGHT JOIN
classdescription cd on (cs.s_ClassID = cd.ClassID AND cs.s_PersonID 12) WHERE cd.ClassType='firstaid'
GROUP BY cd.ClassID
HAVING ClassTotal In the join we drop out signups of PersonID 12, so they don't get counted.
Finally we probably want to show the available classes in date order:
SELECT cs.s_ClassID, COUNT(s_ClassID) AS ClassTotal , cd.ClassName, cd.ClassID, cd.ClassType, cd.ClassDate, cd.ClassMax
from class_signups cs RIGHT JOIN
classdescription cd on (cs.s_ClassID = cd.ClassID AND cs.s_PersonID 12)
WHERE cd.ClassType='firstaid'
GROUP BY cd.ClassID
HAVING ClassTotal
I had fun figuring this out, I hope it works for you.
(sorry it was so long).
Rich
If you cancel a long-time running query by Ctrl-C, you might find the CPU load of mysqld remains at 99%. That's because the query is still running on mysqld, and Ctrl-C only closes the client.
Now, you can enter mysql again and use command SHOW PROCESSLIST to check the thread of the query, and kill the query by command KILL thread_id.
I'm using mysql 5.0.21.
Be aware that SQL_CALC_FOUND_ROWS disables ORDER BY ... LIMIT optimizations (see bugs http://bugs.mysql.com/bug.php?id=18454 and http://bugs.mysql.com/bug.php?id=19553). Until it's fixed, you should run your own benchmarks with and without it.
Since the LIMIT clause of a SELECT statement doesn't allow user variables you can use a prepared statement as in the example above in the manual. An alternative is to load all record ids of yourTable into a temporary table as shown below. This also has the benefit of getting all data necessary for pagination of your result set:
CREATE PROCEDURE `listData`(IN _limitstart INT, IN _limit INT)
BEGIN
-- make a 'row container'
DROP TEMPORARY TABLE IF EXISTS AllRows;
CREATE TEMPORARY TABLE AllRows (rownum INT, id INT, label VARCHAR(50)) ENGINE=MEMORY;
-- insert all ids (and optional labels (for use in a page selector))
SET @a=-1;
INSERT INTO AllRows SELECT @a:[email protected]+1 AS rownum, id, CONCAT(first_name, ' ', last_name) AS label FROM yourTable;
## Output 1: total number of rows
SELECT @a+1 AS total_rows;
## Output 2: id/labels for pagination [see table 'NumberSeq' below]
SELECT * FROM AllRows
INNER JOIN NumberSeq ON AllRows.rownum = NumberSeq.n*_limit
WHERE (n*_limit)
## Output 3: data for list
SELECT yourTable.* FROM yourTable
INNER JOIN AllRows ON yourTable.id = AllRows.id
WHERE rownum >= _limitstart AND rownum
DROP TEMPORARY TABLE AllRows;
END
The NumberSeq table just contains the numbers 0, 1, 2, 3, ... 500 (or whatever limit you want to set on number of pages..):
CREATE PROCEDURE `createNumberSeq `()
BEGIN
DECLARE _n int default -1;
DROP TABLE IF EXISTS NumberSeq;
CREATE TABLE NumberSeq (n INT);
loop1: LOOP
SET _n = _n + 1;
INSERT INTO NumberSeq (n) VALUES _n;
IF _n >= 500 THEN
LEAVE loop1;
END IF
END LOOP loop1;
END
With smaller record sets the second approach is faster than the prepared statement approach. Haven't checked speed with bigger record sets, but suspect the first approach will win then...
Hope this helps to get around the limitations of the LIMIT clause. To the MySQL team: any plans to allow user variables in the LIMIT clause? (pleeeze!)
If you want to get the record in an specific order you can do it like this
SELECT * FROM table ORDER BY FIELD( id, 23, 234, 543, 23 )
or if the table as a name
SELECT * FROM table ORDER BY FIELD( name, 'miguel', 'rene', 'ana', 'tlaxcala' )
Example of using dynamic column_name parameters in the ORDER BY clause of a SELECT statement in stored procedures:
http://dev.mysql.com/doc/refman/5.1/en/control-flow-functions.html
(go to posting by Marc Grue on June 24 2006)
Or you could use this.
select [column], rand() as rnd from [table] order by rnd
Althoug there is still overhead compared to "order by null" its not as bad as "order by rand()".
I was trying to figure out how to sort a varchar field which contained both number string and alphanumeric string. I wanted to sort it so that the numbers would be in order and then the alphanumeric entries would be in order. Here is the query that helped me accomplish that:
SELECT partnum, comments , if( partnum >0, cast( partnum AS SIGNED ) , 0 ) AS numpart,
if( partnum >0, 0, partnum ) AS stringpart
FROM `part`
ORDER BY `numpart` , `stringpart`
If you wish to use OUTFILE or DUMPFILE with a variable for the file name you cannot simply put it in place - MySQL will not resolve the name.
But you can put the whole command into a variable and use "prepare" and "execute" for example:
SELECT @myCommand := concat("SELECT * into OUTFILE '/home/mysql/archive/daemonLog-", DATE_FORMAT(now(),'%Y%m%d-%H%i%s'), "' FROM daemonLog");
PREPARE stmt FROM @myCommand;
EXECUTE stmt;
This will work, Good luck.
Just my little contribution when it comes to random row selection used with mysql & php. Based on the solution that consists of returning the count(*) of a table, then using that value to select a random row.
SELECT FLOOR(RAND() * COUNT(*)) AS rand_row FROM foo;
SELECT * FROM foo LIMIT {$rand_row}, 1;
or
SELECT COUNT(*) AS rows FROM foo;
SELECT * FROM foo LIMIT {rand(0,$rows-1)}, 1;
The problem with that solution from the MySQL standpoint is that there still remains the possibility of duplicate selections when we want more than one row, especially if the table is not that large (e.g. what are the chances of getting at least 2 duplicate rows while selecting 5 randomly, 1 at a time, out of a set of 10).
My approach is to rather generate unique random numbers from php, then fetch the corresponding table rows:
1- Use the appropriate php methods to fetch the table count from MySQL as done before:
SELECT COUNT(*) FROM foo;
2- Use php to generate some unique random numbers based on the count.
This is the php function that i use. It takes 3 arguments: the minimum and maximum range values, and the amount of unique random numbers to be returned. It returns these numbers as an array.
/*Array of Unique Random Numbers*/
function uniq_rand($min,$max,$size){
$randoms=array(); //this is our array
/*if ($min > $max) swap their value*/
if($min>$max){
$min=$min^$max;$max=$min^$max;$min=$min^$max;
}
/*if requested size of array is larger than possible
or if requested size of array is negative return*/
if ( (($max-$min)+1)$size || $size0 ){return false;}
/*while array has not reached the requested size
keep generating random numbers and insert them
if they're not yet present in the array */
while (count($randoms)$size){
$newval = rand($min,$max);
if(!in_array($newval,$randoms)){$randoms[] = $newval;}
}
return $randoms;
}
?>
3- Once you receive your set of randoms from the above function, perform a query for each random:
foreach($randoms as $random_row){
$query="SELECT * FROM foo LIMIT $random_row, 1;"
//perform query, retrieve values and move on to the next random row
...
}
?>
That's it
-----
On a side note regarding the php random number generation function that I have here, I'm sure it's not the best solution all the time. For example, the closer the amount of random numbers gets to the range of numbers available the less efficient the function gets, i.e. if you have a range of 300 numbers and you want 280 of them unique and random, the function could spend quite some time trying to get the last 10 numbers into the array. Some probabilities get involved here, but I suspect that it would be faster to insert the 300 numbers directly into an array, shuffle that array, then finally select the 280 first entries and return them.
Also, as pointed earlier in the thread, keep in mind that if your table isn't that large, just performing the following works very well (e.g. selecting 5 random rows on a moderately large table):
SELECT * FROM foo ORDER BY RAND() LIMIT 5;
If you want use multilanguage queryies you cat use this:
Table 1
--------------
langid langname
--------------
1 rus
2 eng
3 den
---------------
Table 2 (catalog)
-----------------------
catid url
-----------------------
1 www.google.com
2 www.yandex.ru
3 www.mysql.com
------------------------
table 3 (titles of sites from Table 3)
-------------------------------------
langid catid title
-------------------------------------
1 1 Poiskovaya sistema
2 1 Search system
1 2 Portal
2 2 Portal
3 2 Portal
1 3 Sayt razrabotchikov MySQL
2 3 Site of MySQL's team
3 3 Bla bla bla
------------------------------------
And you need select sites from table2 on any language (for example Denmark), but site google.com have not title by Denmark. Ok if you can't select title by current language, you should select title by default language (here russian). You can make in one query
SELECT *, (
SELECT title
FROM table3
WHERE table3.catid = table2.catid AND langid = 3
UNION
SELECT title
FROM table3
WHERE table3.catid = table2.catid AND langid = 1
LIMIT 1
) as title
FROM table2
It very easy, but i think it query very big for MySQL if table2 contain around 1000-5000 rows, and site have 5000-6000 people per second.
You can make it another:
SELECT *, (SELECT title FROM table3 ORDER BY IF(langid='1',0,1) ASC LIMIT 1) as title FROM `table2`
i couldn't compare this queries, if anybody can compary spped of this method please write [email protected] (by russian (:^) .
Now my task more complexed, i need select any site from table2 :
1 - On current language
2 - If site have not title, Select title by default language
3 - If site have not title on default, Select title by any language.
I think if will make it by thats method - it will very big for MySQL.
In regards to:
Posted by Count Henry De Havilland-Fortesque-Smedley on January 13 2004 5:59am
--------------START QUOTE---------------------
Sometimes you want to retrieve the records that DONT match a select statement.
Consider this select:
SELECT CarIndex FROM DealerCatalog, BigCatalog WHERE
DealerCatalog.CarIndex=BigCatalog.CarIndex
This finds all the CarIndex values in the Dealer's catalog that are in the bigger distributor catalog.
How do I then find the dealer CarIndex values that ARE NOT in the bigger catalog?
The answer is to use LEFT JOIN - anything that doesn't join is given a NULL value , so we look for that:
SELECT CarIndex FROM DealerCatalog LEFT JOIN BigCatalog ON DealerCatalog.CarIndex=BigCatalog.CarIndex WHERE BigCatalog.CarIndex IS NULL
------------------END QUOTE--------------------------
I have found that the Left Join is quite expensive when doing this type of SQL Query. It is great if you have less than 1000 records in each table that you want to compare. But the real hardship is realized when you have 100,000 records in each table. Trying to do this type of join takes forever because each and every record in 1 table has to be compared to each and every record in the other table. In the case of 100,000 records, MySQL will do 10 BILLION comparisons (from what I have read, I may be mistaken).
So I tried the sql query above to see which rows in 1 table do not have a corresponding value in the other table. (Note that each table had close to 100,000 rows) I waited for 10 minutes and the Query was still going. I have since came up with a better way that works for me and I hope it will work for someone else. Here goes....
1: You must create another field in your base table. Let's call the new field `linked` (For the example above, we would perform this query ---ONLY ONCE--- to create the linked field in the DealerCatalog table.)
ALTER TABLE `DealerCatalog` ADD `linked` TINYINT NOT NULL ;
2: Now to get your results, simply execute the following queries instead of the left join query stated above
UPDATE `DealerCatalog` SET `linked` = 0;
UPDATE `DealerCatalog`, `BigCatalog` SET `linked` = 1 WHERE `DealerCatalog`.`CarIndex` = `BigCatalog`.`CarIndex`;
SELECT `CarIndex` FROM `DealerCatalog` WHERE `linked` = 0;
I know it is 3 queries instead of 1 but I am able to achieve the same result with 100K rows in each table in about 3 seconds instead of 10 minutes (That is just how long I waited until I gave up. Who knows how long it actually takes) using the LEFT JOIN method.
I would like to see if anyone else has a better way of dealing with this type of situation. I have been looking for a better solution for a few years now. I haven't tried MySQL 5 yet to see if there is a way to maybe create a view to deal with this situation but I suspect MySQL developers know about the expensive LEFT JOIN....IS NULL situation on large tables and are doing something about it.
Until then, you have my contribution
Ray Perea
Make sure you don't use stored functions in your WHERE clause if it is not necessary.
For our search feature, we needed to get an id using a stored function. Since it was in the WHERE clause, it reprocesses the function for every row! This could turn out to be pretty heavy.
If you can, do it in the FROM clause. Ex:
SELECT
...
FROM
...,
(select getSpecialID() as specialID) as tmp
WHERE
...
In our case we went from 6.5 sec query to 0.48 sec. We have over 2 million rows in our tables.
Add your own comment.